about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* S390: Optimize string, wcsmbs and memory functions.Stefan Liebler2015-08-262-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch set introduces optimized string, wcsmbs and memory functions for S390/S390x. The functions are accelerated by the usage of the new z13 vector instructions. The Principles of Operations manual for IBM z13 is publically available: http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr010.pdf The support for these instructions in assembler was introduced by commits: -"[Committed] S/390: Add support for IBM z13." (https://sourceware.org/ml/binutils/2015-01/msg00197.html) -"[Committed] S/390: Add more IBM z13 instructions" (https://sourceware.org/ml/binutils/2015-03/msg00088.html) The first patches do preparation for the latter optimization patches. The floating point exception handling - fetestexcept(), ... - is fixed and the platform and hwcap strings are extended. The current ifunc routines memset, memcpy and memcmp are refactored and the ifunc test-framework is now enabled. A S390 specific configure-check tests if the used binutils supports the new vector instructions. The optimized functions are provided via ifunc if the binutils supports the vector instructions. Otherwise a message is dumped to configure output and only the currently used common code functions are available. The optimized functions are implemented in common for s390-32 and s390-64 and the few differences are handled via #ifdef. The ifunc-resolvers are defined in files sysdeps/s390/multiarch/<func>.c, which choose either the current implementation __<func>_c() or the vector implementation __<func>_vx() depending on the HWCAP_S390_VX flag bit in AT_HWCAP field. If the bit is set, the hardware and the kernel are supporting vector registers and instructions. If the used binutils lacks vector-support, then the default implementation in string or wcsmbs directory is included here instead. The file sysdeps/s390/multiarch/<func>-c.c includes the current implementation and defines the function name __<func>_c. The assembler files sysdeps/s390/multiarch/<func>-vx.S with the vector instructions are using the directive '.machine "z13"' to allow building glibc without option '-march=z13'. Additionally the directive '.machinemode "zarch_nohighgprs"' is needed for the 31bit glibc. This mode does not set the highgprs flag in ELF header, which would lead to an unloadable libc on a 31bit kernel. The most optimized string functions are structured in the same way: The first 16 bytes of the string is loaded unaligned via vlbb - vector load to block boundary (e.g. 4k). This instruction loads 16 bytes if possible. In case of a page cross, it only loads the last bytes of the current page without a segmentation fault. Afterwards these first part of string is processed. If e.g. for strlen the end of string is reached within this first part, the function returns. Otherwise the pointer is aligned to 16 byte, so i can load a full vector register with vl without checking for a page cross. Afterwards the first part of string is processed. If e.g. for strlen the end of string is reached within this first part, the function returns. Otherwise the pointer is aligned to 16 byte, so a full vector register can be loaded with vl - vector load - without checking for a page cross. The remaining string is processed in a four times unrolled loop, because benchmark results measured improvements compared to a non unrolled loop. The optimized wide string functions can only handle 4byte aligned string pointers. Although a wchar_t pointer should always be 4byte aligned, the most current common code wide string functions can handle non aligned strings. Thus the optimized functions will fall back to the common code functions in case of a non aligned wide string to behave the same as before this patch. Some string tests can test the string and the wide string version of a function. The remaining ones are extended and new wide string tests are added. This is the same in case of the benchtests. ChangeLog: * NEWS: New item for IBM z13 string optimizations.
* S390: Optimize memrchr.Stefan Liebler2015-08-266-1/+227
| | | | | | | | | | | | | | | This patch provides optimized version of memrchr with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/memrchr-c.c: New File. * sysdeps/s390/multiarch/memrchr-vx.S: Likewise. * sysdeps/s390/multiarch/memrchr.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add memrchr functions. * sysdeps/s390/multiarch/ifunc-impl-list-common.c (__libc_ifunc_impl_list_common): Add ifunc test for memrchr.
* S390: Optimize wmemcmp.Stefan Liebler2015-08-268-2/+240
| | | | | | | | | | | | | | | | | This patch provides optimized version of wmemcmp with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/wmemcmp-c.c: New File. * sysdeps/s390/multiarch/wmemcmp-vx.S: Likewise. * sysdeps/s390/multiarch/wmemcmp.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add wmemcmp functions. * sysdeps/s390/multiarch/ifunc-impl-list-common.c (__libc_ifunc_impl_list_common): Add ifunc test for wmemcmp. * benchtests/bench-wmemcmp.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wmemcmp.
* S390: Optimize wmemset.Stefan Liebler2015-08-2613-56/+373
| | | | | | | | | | | | | | | | | | | | | | This patch provides optimized version of wmemset with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/wmemset-c.c: New File. * sysdeps/s390/multiarch/wmemset-vx.S: Likewise. * sysdeps/s390/multiarch/wmemset.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add wmemset functions. * sysdeps/s390/multiarch/ifunc-impl-list-common.c (__libc_ifunc_impl_list_common): Add ifunc test for wmemset. * wcsmbs/wmemset.c: Use WMEMSET if defined. * string/test-memset.c: Add wmemset support. * wcsmbs/test-wmemset.c: New File. * wcsmbs/Makefile (strop-tests): Add wmemset. * benchtests/bench-memset.c: Add wmemset support. * benchtests/bench-wmemset.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wmemset.
* S390: Optimize memccpy.Stefan Liebler2015-08-267-1/+228
| | | | | | | | | | | | | | | | This patch provides optimized versions of memccpy with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/memccpy-c.c: New File. * sysdeps/s390/multiarch/memccpy-vx.S: Likewise. * sysdeps/s390/multiarch/memccpy.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add memccpy functions. * sysdeps/s390/multiarch/ifunc-impl-list-common.c (__libc_ifunc_impl_list_common): Add ifunc test for memccpy. * string/memccpy.c: Use MEMCCPY if defined.
* S390: Optimize memchr, rawmemchr and wmemchr.Stefan Liebler2015-08-2620-61/+795
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of memchr, rawmemchr and wmemchr with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/memchr-vx.S: New File. * sysdeps/s390/multiarch/memchr.c: Likewise. * sysdeps/s390/multiarch/rawmemchr-c.c: Likewise. * sysdeps/s390/multiarch/rawmemchr-vx.S: Likewise. * sysdeps/s390/multiarch/rawmemchr.c: Likewise. * sysdeps/s390/multiarch/wmemchr-c.c: Likewise. * sysdeps/s390/multiarch/wmemchr-vx.S: Likewise. * sysdeps/s390/multiarch/wmemchr.c: Likewise. * sysdeps/s390/s390-32/multiarch/memchr.c: Likewise. * sysdeps/s390/s390-64/multiarch/memchr.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add memchr, wmemchr and rawmemchr functions. * sysdeps/s390/multiarch/ifunc-impl-list-common.c (__libc_ifunc_impl_list_common): Add ifunc test for memchr, rawmemchr and wmemchr. * wcsmbs/wmemchr.c: Use WMEMCHR if defined. * string/test-memchr.c: Add wmemchr support. * wcsmbs/test-wmemchr.c: New File. * wcsmbs/Makefile (strop-tests): Add wmemchr. * benchtests/bench-memchr.c: Add wmemchr support. * benchtests/bench-wmemchr.c: New File. * benchtests/Makefile (wcsmbs-bench): wmemchr.
* S390: Optimize strcspn and wcscspn.Stefan Liebler2015-08-2616-29/+822
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strcspn and wcscspn with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strcspn-c.c: New File. * sysdeps/s390/multiarch/strcspn-vx.S: Likewise. * sysdeps/s390/multiarch/strcspn.c: Likewise. * sysdeps/s390/multiarch/wcscspn-c.c: Likewise. * sysdeps/s390/multiarch/wcscspn-vx.S: Likewise. * sysdeps/s390/multiarch/wcscspn.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strcspn and wcscspn functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strcspn, wcscspn. * wcsmbs/wcscspn.c: Use WCSCSPN if defined. * string/test-strcspn.c: Add wcscspn support. * wcsmbs/test-wcscspn.c: New File. * wcsmbs/Makefile (strop-tests): Add wcscspn. * benchtests/bench-strcspn.c: Add wcscspn support. * benchtests/bench-wcscspn.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcscspn.
* S390: Optimize strpbrk and wcspbrk.Stefan Liebler2015-08-2616-88/+948
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strpbrk and wcspbrk with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strpbrk-c.c: New File. * sysdeps/s390/multiarch/strpbrk-vx.S: Likewise. * sysdeps/s390/multiarch/strpbrk.c: Likewise. * sysdeps/s390/multiarch/wcspbrk-c.c: Likewise. * sysdeps/s390/multiarch/wcspbrk-vx.S: Likewise. * sysdeps/s390/multiarch/wcspbrk.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strpbrk and wcspbrk functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strpbrk, wcspbrk. * wcsmbs/wcspbrk.c: Use WCSPBRK if defined. * string/test-strpbrk.c: Add wcspbrk support. * wcsmbs/test-wcspbrk.c: New File. * wcsmbs/Makefile (strop-tests): Add wcspbrk. * benchtests/bench-strpbrk.c: Add wcspbrk support. * benchtests/bench-wcspbrk.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcspbrk.
* S390: Optimize strspn and wcsspn.Stefan Liebler2015-08-2616-64/+823
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strspn and wcsspn with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strspn-c.c: New File. * sysdeps/s390/multiarch/strspn-vx.S: Likewise. * sysdeps/s390/multiarch/strspn.c: Likewise. * sysdeps/s390/multiarch/wcsspn-c.c: Likewise. * sysdeps/s390/multiarch/wcsspn-vx.S: Likewise. * sysdeps/s390/multiarch/wcsspn.c: Likewise. * wcsmbs/wcsspn.c: Use WCSSPN if defined. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strspn and wcsspn functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strspn, wcsspn. * string/test-strspn.c: Add wcsspn support. * wcsmbs/test-wcsspn.c: New File. * wcsmbs/Makefile (strop-tests): Add wcsspn. * benchtests/bench-strspn.c: Add wcsspn support. * benchtests/bench-wcsspn.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsspn.
* S390: Optimize strrchr and wcsrchr.Stefan Liebler2015-08-2611-3/+522
| | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strrchr and wcsrchr with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strrchr-c.c: New File. * sysdeps/s390/multiarch/strrchr-vx.S: Likewise. * sysdeps/s390/multiarch/strrchr.c: Likewise. * sysdeps/s390/multiarch/wcsrchr-c.c: Likewise. * sysdeps/s390/multiarch/wcsrchr-vx.S: Likewise. * sysdeps/s390/multiarch/wcsrchr.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strrchr and wcsrchr functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strrchr, wcsrchr. * benchtests/bench-wcsrchr.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsrchr.
* S390: Optimize strchrnul and wcschrnul.Stefan Liebler2015-08-2616-19/+406
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strchrnul and wcschrnul with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strchrnul-c.c: New File. * sysdeps/s390/multiarch/strchrnul-vx.S: Likewise. * sysdeps/s390/multiarch/strchrnul.c: Likewise. * sysdeps/s390/multiarch/wcschrnul-c.c: Likewise. * sysdeps/s390/multiarch/wcschrnul-vx.S: Likewise. * sysdeps/s390/multiarch/wcschrnul.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strchrnul and wcschrnul functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strchrnul, wcschrnul. * wcsmbs/wcschrnul.c: Use WCSCHRNUL if defined. * string/test-strchr.c: Add wcschrnul support. * wcsmbs/test-wcschrnul.c: New File. * wcsmbs/Makefile (strop-tests): Add wcschrnul. * benchtests/bench-strchr.c: Add wcschrnul support. * benchtests/bench-wcschrnul.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcschrnul.
* S390: Optimize strchr and wcschr.Stefan Liebler2015-08-2612-5/+376
| | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strchr and wcschr with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strchr-c.c: New File. * sysdeps/s390/multiarch/strchr-vx.S: Likewise. * sysdeps/s390/multiarch/strchr.c: Likewise. * sysdeps/s390/multiarch/wcschr-c.c: Likewise. * sysdeps/s390/multiarch/wcschr-vx.S: Likewise. * sysdeps/s390/multiarch/wcschr.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strchr and wcschr functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strchr, wcschr. * string/strchr.c (STRCHR): Define and use macro. * benchtests/bench-wcschr.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcschr.
* S390: Optimize strncmp and wcsncmp.Stefan Liebler2015-08-2613-29/+558
| | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strncmp and wcsncmp with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strncmp-c.c: New File. * sysdeps/s390/multiarch/strncmp-vx.S: Likewise. * sysdeps/s390/multiarch/strncmp.c: Likewise. * sysdeps/s390/multiarch/wcsncmp-c.c: Likewise. * sysdeps/s390/multiarch/wcsncmp-vx.S: Likewise. * sysdeps/s390/multiarch/wcsncmp.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strncmp and wcsncmp functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strncmp, wcsncmp. * wcsmbs/wcsncmp.c (WCSNCMP): Define and use macro. * benchtests/bench-strncmp.c: Add wcsncmp support. * benchtests/bench-wcsncmp.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsncmp.
* S390: Optimize strcmp and wcscmp.Stefan Liebler2015-08-2614-6/+430
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strcmp and wcscmp with the z13 vector instructions. The architecture specific string.h had a typo, which leads to ommiting the inline version in this file if __USE_STRING_INLINES is defined. Tested this inline version by tweaking test-strcmp.c. ChangeLog: * sysdeps/s390/multiarch/strcmp-vx.S: New File. * sysdeps/s390/multiarch/strcmp.c: Likewise. * sysdeps/s390/multiarch/wcscmp-c.c: Likewise. * sysdeps/s390/multiarch/wcscmp-vx.S: Likewise. * sysdeps/s390/multiarch/wcscmp.c: Likewise. * sysdeps/s390/s390-32/multiarch/strcmp.c: Likewise. * sysdeps/s390/s390-64/multiarch/strcmp.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strcmp and wcscmp functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strcmp, wcscmp. * string/strcmp.c (STRCMP): Define and use macro. * benchtests/bench-wcscmp.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcscmp. * sysdeps/s390/bits/string.h: Fix typo: _HAVE_STRING_ARCH_strcmp instead of _HAVE_STRING_ARCH_memchr.
* S390: Optimize strncat wcsncat.Stefan Liebler2015-08-2616-93/+826
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strncat and wcsncat with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strncat-c.c: New File. * sysdeps/s390/multiarch/strncat-vx.S: Likewise. * sysdeps/s390/multiarch/strncat.c: Likewise. * sysdeps/s390/multiarch/wcsncat-c.c: Likewise. * sysdeps/s390/multiarch/wcsncat-vx.S: Likewise. * sysdeps/s390/multiarch/wcsncat.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strncat and wcsncat functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strncat, wcsncat. * wcsmbs/wcsncat.c (WCSNCAT): Define and use macro. * string/test-strncat.c: Add wcsncat support. * wcsmbs/test-wcsncat.c: New File. * wcsmbs/Makefile (strop-tests): Add wcsncat. * benchtests/bench-strncat.c: Add wcsncat support. * benchtests/bench-wcsncat.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsncat.
* S390: Optimize strcat and wcscat.Stefan Liebler2015-08-2617-82/+661
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strcat and wcscat with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strcat-c.c: New File. * sysdeps/s390/multiarch/strcat-vx.S: Likewise. * sysdeps/s390/multiarch/strcat.c: Likewise. * sysdeps/s390/multiarch/wcscat-c.c: Likewise. * sysdeps/s390/multiarch/wcscat-vx.S: Likewise. * sysdeps/s390/multiarch/wcscat.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strcat and wcscat functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strcat, wcscat. * string/strcat.c (STRCAT): Define and use macro. * wcsmbs/wcscat.c: Use WCSCAT if defined. * string/test-strcat.c: Add wcscat support. * wcsmbs/test-wcscat.c: New File. * wcsmbs/Makefile (strop-tests): Add wcscat. * benchtests/bench-strcat.c: Add wcscat support. * benchtests/bench-wcscat.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcscat.
* S390: Optimize stpncpy and wcpncpy.Stefan Liebler2015-08-2616-26/+665
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of stpncpy and wcpncpy with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/stpncpy-c.c: New File. * sysdeps/s390/multiarch/stpncpy-vx.S: Likewise. * sysdeps/s390/multiarch/stpncpy.c: Likewise. * sysdeps/s390/multiarch/wcpncpy-c.c: Likewise. * sysdeps/s390/multiarch/wcpncpy-vx.S: Likewise. * sysdeps/s390/multiarch/wcpncpy.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add stpncpy and wcpncpy functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for stpncpy, wcpncpy. * wcsmbs/wcpncpy.c: Use WCPNCPY if defined. * string/test-stpncpy.c: Add wcpncpy support. * wcsmbs/test-wcpncpy.c: New File. * wcsmbs/Makefile (strop-tests): Add wcpncpy. * benchtests/bench-stpncpy.c: Add wcpncpy support. * benchtests/bench-wcpncpy.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcpncpy.
* S390: Optimize strncpy and wcsncpy.Stefan Liebler2015-08-2617-87/+782
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strncpy and wcsncpy with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strncpy-vx.S: New File. * sysdeps/s390/multiarch/strncpy.c: Likewise. * sysdeps/s390/multiarch/wcsncpy-c.c: Likewise. * sysdeps/s390/multiarch/wcsncpy-vx.S: Likewise. * sysdeps/s390/multiarch/wcsncpy.c: Likewise. * sysdeps/s390/s390-32/multiarch/strncpy.c: Likewise. * sysdeps/s390/s390-64/multiarch/strncpy.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strncpy and wcsncpy functions. * wcsmbs/wcsncpy.c: Use WCSNCPY if defined. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strncpy, wcsncpy. * string/test-strncpy.c: Add wcsncpy support. * wcsmbs/test-wcsncpy.c: New File. * wcsmbs/Makefile (strop-tests): Add wcsncpy. * benchtests/bench-strncpy.c: Add wcsncpy support. * benchtests/bench-wcsncpy.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsncpy
* S390: Optimize stpcpy and wcpcpy.Stefan Liebler2015-08-2616-24/+461
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of stpcpy and wcpcpy with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/stpcpy-c.c: New File. * sysdeps/s390/multiarch/stpcpy-vx.S: Likewise. * sysdeps/s390/multiarch/stpcpy.c: Likewise. * sysdeps/s390/multiarch/wcpcpy-c.c: Likewise. * sysdeps/s390/multiarch/wcpcpy-vx.S: Likewise. * sysdeps/s390/multiarch/wcpcpy.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add stpcpy and wcpcpy functions. * string/stpcpy.c: Use STPCPY if defined. * wcsmbs/wcpcpy.c: Use WCPCPY if defined. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for stpcpy, wcpcpy. * string/test-stpcpy.c: Add wcpcpy support. * wcsmbs/test-wcpcpy.c: New File. * wcsmbs/Makefile (strop-tests): Add wcpcpy. * benchtests/bench-stpcpy.c: Add wcpcpy support. * benchtests/bench-wcpcpy.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcpcpy.
* S390: Optimize strcpy and wcscpy.Stefan Liebler2015-08-2612-3/+382
| | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strcpy and wcscpy with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strcpy-vx.S: New File. * sysdeps/s390/multiarch/strcpy.c: Likewise. * sysdeps/s390/multiarch/wcscpy-c.c: Likewise. * sysdeps/s390/multiarch/wcscpy-vx.S: Likewise. * sysdeps/s390/multiarch/wcscpy.c: Likewise. * sysdeps/s390/s390-32/multiarch/strcpy.c: Likewise. * sysdeps/s390/s390-64/multiarch/strcpy.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strcpy and wcscpy functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strcpy, wcscpy. * benchtests/bench-wcscpy.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcscpy.
* S390: Optimize strnlen and wcsnlen.Stefan Liebler2015-08-2616-63/+572
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strnlen and wcsnlen with the z13 vector instructions. ChangeLog: * sysdeps/s390/multiarch/strnlen-c.c: New File. * sysdeps/s390/multiarch/strnlen-vx.S: Likewise. * sysdeps/s390/multiarch/strnlen.c: Likewise. * sysdeps/s390/multiarch/wcsnlen-c.c: Likewise. * sysdeps/s390/multiarch/wcsnlen-vx.S: Likewise. * sysdeps/s390/multiarch/wcsnlen.c: Likewise. * sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strnlen and wcsnlen functions. * sysdeps/s390/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add ifunc test for strnlen, wcsnlen. * wcsmbs/wcsnlen.c: Use WCSNLEN if defined. * string/test-strnlen.c: Add wcsnlen support. * wcsmbs/test-wcsnlen.c: New File. * wcsmbs/Makefile (strop-tests): Add wcsnlen. * benchtests/bench-strnlen.c: Add wcsnlen support. * benchtests/bench-wcsnlen.c: New File. * benchtests/Makefile (wcsmbs-bench): Add wcsnlen.
* S390: Optimize strlen and wcslen.Stefan Liebler2015-08-2612-2/+348
| | | | | | | | | | | | | | | | | | | | | | | | | This patch provides optimized versions of strlen and wcslen with the z13 vector instructions. The helper macro IFUNC_VX_IMPL is introduced and is used to register all __<func>_c() and __<func>_vx() functions within __libc_ifunc_impl_list() to the ifunc test framework. ChangeLog: * sysdeps/s390/multiarch/Makefile: New File. * sysdeps/s390/multiarch/strlen-c.c: Likewise. * sysdeps/s390/multiarch/strlen-vx.S: Likewise. * sysdeps/s390/multiarch/strlen.c: Likewise. * sysdeps/s390/multiarch/wcslen-c.c: Likewise. * sysdeps/s390/multiarch/wcslen-vx.S: Likewise. * sysdeps/s390/multiarch/wcslen.c: Likewise. * string/strlen.c (STRLEN): Define and use macro. * sysdeps/s390/multiarch/ifunc-impl-list.c (IFUNC_VX_IMPL): New macro function. (__libc_ifunc_impl_list): Add ifunc test for strlen, wcslen. * benchtests/Makefile (wcsmbs-bench): New variable. (string-bench-all): Added wcsmbs-bench. * benchtests/bench-wcslen.c: New File.
* S390: Ifunc resolver macro for vector instructions.Stefan Liebler2015-08-262-0/+24
| | | | | | | | | | | This patch introduces a s390 specific ifunc resolver macro for 32/64bit, which chooses <func>_vx with vector instructions if HWCAP_S390_VX flag in hwcaps is set or <func>_c if not. ChangeLog: * sysdeps/s390/multiarch/ifunc-resolve.h (s390_vx_libc_ifunc, s390_vx_libc_ifunc2): New macro function.
* S390: configure check for vector instruction support in assembler.Stefan Liebler2015-08-264-0/+80
| | | | | | | | | | | | | | | | | | | | | | The S390 specific test checks if the assembler has support for the new z13 vector instructions by compiling a vector instruction. The .machine and .machinemode directives are needed to compile the vector instruction without -march=z13 option on 31/64 bit. On success the macro HAVE_S390_VX_ASM_SUPPORT is defined. This macro is used to determine if the optimized functions can be build without compile errors. If the used assembler lacks vector support, then a warning is dumped while configuring and only the common code functions are build. The z13 instruction support was introduced in "[Committed] S/390: Add support for IBM z13." (https://sourceware.org/ml/binutils/2015-01/msg00197.html) ChangeLog: * config.h.in (HAVE_S390_VX_ASM_SUPPORT): New macro undefine. * sysdeps/s390/configure.ac: Add test for S390 vector instruction assembler support. * sysdeps/s390/configure: Regenerated.
* S390: Add new s390 platform.Stefan Liebler2015-08-263-3/+8
| | | | | | | | | | | | The new IBM z13 is added to platform string array. The macro _DL_PLATFORMS_COUNT is incremented to 8, because it was not incremented by commit "S/390: Sync AUXV capabilities and archs with kernel". ChangeLog: * sysdeps/s390/dl-procinfo.c (_dl_s390_cap_flags): Add z13. * sysdeps/s390/dl-procinfo.h (_DL_PLATFORMS_COUNT): Increased.
* S390: Add hwcaps value for vector facility.Stefan Liebler2015-08-264-3/+11
| | | | | | | | | | | | | | | The HWCAP_S390_VX flag in hwcap field of auxiliary vector indicates if the vector facility is available and the kernel is aware of it. This can be tested with LD_SHOW_AUXV=1 <prog>. Currently it does not show te, because it was not incremented by commit "S/390: Add hwcap value for transactional execution.". Thus _DL_HWCAP_COUNT is incremented by two. ChangeLog: * sysdeps/s390/dl-procinfo.c (_dl_s390_platforms): Add vector flag. * sysdeps/s390/dl-procinfo.h: Add vector capability. * sysdeps/unix/sysv/linux/s390/bits/hwcap.h (HWCAP_S390_VX): Define.
* S390: Refactor ifunc implementations and enable ifunc-test-framework.Stefan Liebler2015-08-2624-484/+676
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390 all ifunc resolvers were implemented in multiarch/ifunc-resolve.c. The resulting single object files has undefined references to all ifunc-functions. This patch introduces one multiarch/<func>.c file for each of memcpy, memcmp and memset with the function specific ifunc resolver. The different function implementations are now implemented in multiarch/<func>-s390x.S (moved from multiarch/<func>.S). The new multiarch/ifunc-resolve.h file contains the ifunc-resolver macro and other helper-macros. They are merged and are now used in common for 32/64bit. Therefore the __<func>_g5/__<func>_z900 functions were renamed to __<func>_default. This patch also enables testing the ifunc implementations by implementing the function __libc_ifunc_impl_list. It uses the helper-macros of ifunc-resolve.h. ChangeLog: * sysdeps/s390/s390-32/multiarch/Makefile (sysdep_routines): Remove ifunc-resolve, add memset-s390, memcpy-s390, memcmp-s390. * sysdeps/s390/s390-32/multiarch/ifunc-resolve.c: Delete File. * sysdeps/s390/s390-32/multiarch/memcmp.S: Move to ... * sysdeps/s390/s390-32/multiarch/memcmp-s390.S: ... here. (memcmp, bcmp): Use __memcmp_default as alias source. * sysdeps/s390/s390-32/multiarch/memcmp.c: New File. * sysdeps/s390/s390-32/memcmp.S (__memcmp_g5): Rename to __memcmp_default. * sysdeps/s390/s390-32/multiarch/memcpy.S: Move to ... * sysdeps/s390/s390-32/multiarch/memcpy-s390.S: ... here. (memcpy): Use __memcpy_default as alias source. * sysdeps/s390/s390-32/multiarch/memcpy.c: New File. * sysdeps/s390/s390-32/memcpy.S (__memcpy_g5): Rename to __memcpy_default. * sysdeps/s390/s390-32/multiarch/memset.S: Move to ... * sysdeps/s390/s390-32/multiarch/memset-s390.S: ... here. (memset): Use __memset_default as alias source. * sysdeps/s390/s390-32/multiarch/memset.c: New File. * sysdeps/s390/s390-32/memset.S (__memset_g5): Rename to __memset_default. * sysdeps/s390/s390-64/multiarch/Makefile (sysdep_routines): Remove ifunc-resolve, add memset-s390x, memcpy-s390x, memcmp-s390x. * sysdeps/s390/s390-64/multiarch/ifunc-resolve.c: Delete File. * sysdeps/s390/s390-64/multiarch/memcmp.S: Move to ... * sysdeps/s390/s390-64/multiarch/memcmp-s390x.S: ... here. (memcmp, bcmp): Use __memcmp_default as alias source. * sysdeps/s390/s390-64/multiarch/memcmp.c: New File. * sysdeps/s390/s390-64/memcmp.S (__memcmp_z900): Rename to __memcmp_default. * sysdeps/s390/s390-64/multiarch/memcpy.S: Move to ... * sysdeps/s390/s390-64/multiarch/memcpy-s390x.S: ... here. (memcpy): Use __memcpy_default as alias source. * sysdeps/s390/s390-64/multiarch/memcpy.c: New File. * sysdeps/s390/s390-64/memcpy.S (__memcpy_z900): Rename to __memcpy_default. * sysdeps/s390/s390-64/multiarch/memset.S: Move to ... * sysdeps/s390/s390-64/multiarch/memset-s390x.S: ... here. (memset): Use __memset_default as alias source. * sysdeps/s390/s390-64/multiarch/memset.c: New File. * sysdeps/s390/s390-64/memset.S (__memset_z900): Rename to __memset_default. * sysdeps/s390/multiarch/ifunc-resolve.h: New File. * sysdeps/s390/multiarch/ifunc-impl-list.c: New File.
* S390: Fix handling of DXC-byte in FPC-register.Stefan Liebler2015-08-268-15/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390, the DXC(data-exception-code)-byte in FPC(floating-point-control)- register contains a code of the last occured exception. If bits 6 and 7 of DXC-byte are zero, the bits 0-5 correspond to the ieee-exception flag bits. The current implementation always uses these bits as ieee-exception flag bits. fetestexcept() reports any exception after the first usage of a vector-instruction in a process, because it raises an "vector instruction exception" with DXC-code 0xFE. This patch fixes the handling of the DXC-byte. The DXC-Byte is only handled if bits 6 and 7 are zero. The #define _FPU_RESERVED is extended by the DXC-Byte. Otherwise the tests math/test-fpucw-static and math/test-fpucw-ieee-static fails, because DXC-Byte contains the vector instruction exception when reaching main(). This exception was triggered by strrchr() call in __init_misc(). __init_misc() is called after __setfpucw () in __libc_init_first(). The field __ieee_instruction_pointer in struct fenv_t is renamed to __unused because it is a relict from commit "Remove PTRACE_PEEKUSER" (87b9b50f0d4b92248905e95a06a13c513dc45e59) and isn´t used anymore. ChangeLog: [BZ #18610] * sysdeps/s390/fpu/bits/fenv.h (fenv_t): Rename __ieee_instruction_pointer to __unused. * sysdeps/s390/fpu/fesetenv.c (__fesetenv): Remove usage of __ieee_instruction_pointer. * sysdeps/s390/fpu/fclrexcpt.c (feclearexcept): Fix dxc-field handling. * sysdeps/s390/fpu/fgetexcptflg.c (fegetexceptflag): Likewise. * sysdeps/s390/fpu/fsetexcptflg.c (fesetexceptflag): Likewise. * sysdeps/s390/fpu/ftestexcept.c (fetestexcept): Likewise. * sysdeps/s390/fpu/fpu_control.h (_FPU_RESERVED): Mark dxc-field as reserved.
* NaCl: Call __nacl_main in preference to main.Roland McGrath2015-08-252-1/+10
|
* Use SSE2 optimized strcmp in x86-64 ld.soH.J. Lu2015-08-252-253/+220
| | | | | | | Since ld.so preserves vector registers now, we can use the same SSE2 optimized strcmp in x86-64 libc and ld.so. * sysdeps/x86_64/strcmp.S: Remove "#if !IS_IN (libc)".
* Don't run tst-getpid2 with LD_BIND_NOW=1H.J. Lu2015-08-252-5/+5
| | | | | | | | Since _dl_x86_64_save_sse and _dl_x86_64_restore_sse are removed now, we don't need to run tst-getpid2 with LD_BIND_NOW=1. [BZ #11214] * sysdeps/unix/sysv/linux/Makefile (tst-getpid2-ENV): Removed.
* Call direct system calls for socket operationsRajalakshmi Srinivasaraghavan2015-08-2519-0/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | Explicit system calls for the socket operations were added in Linux kernel in commit 86250b9d12ca for powerpc. This patch make use of those instead of calling socketcall to save number of cycles on networking syscalls. 2015-08-25 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/unix/sysv/linux/powerpc/kernel-features.h: Define new macros. * sysdeps/unix/sysv/linux/accept.c: Call direct system call. * sysdeps/unix/sysv/linux/bind.c: Call direct system call. * sysdeps/unix/sysv/linux/connect.c: Call direct system call. * sysdeps/unix/sysv/linux/getpeername.c: Call direct system call. * sysdeps/unix/sysv/linux/getsockname.c: Call direct system call. * sysdeps/unix/sysv/linux/getsockopt.c: Call direct system call. * sysdeps/unix/sysv/linux/listen.c: Call direct system call. * sysdeps/unix/sysv/linux/recv.c: Call direct system call. * sysdeps/unix/sysv/linux/recvfrom.c: Call direct system call. * sysdeps/unix/sysv/linux/recvmsg.c: Call direct system call. * sysdeps/unix/sysv/linux/send.c: Call direct system call. * sysdeps/unix/sysv/linux/sendmsg.c: Call direct system call. * sysdeps/unix/sysv/linux/sendto.c: Call direct system call. * sysdeps/unix/sysv/linux/setsockopt.c: Call direct system call. * sysdeps/unix/sysv/linux/shutdown.c: Call direct system call. * sysdeps/unix/sysv/linux/socket.c: Call direct system call. * sysdeps/unix/sysv/linux/socketpair.c: Call direct system call.
* powerpc: Fix tabort usage in syscallsPaul E. Murphy2015-08-254-4/+13
| | | | | | | | | | | | | | | | | | | Fix usage of tabort in generated syscalls. r0 has special meaning when used with this instruction, thus it will not generate persistent errors, nor return an error code. This mitigates poor CPU usage when performing elided critical sections. Additionally, transactions should be aborted when entering a user invoked syscall. Otherwise the results of the transaction may be undefined. 2015-08-25 Paul E. Murphy <murphyp@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION): Use register other than r0 for tabort, it has special meaning. * sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION): Likewise * sysdeps/unix.sysv/linux/powerpc/syscall.S (syscall): Abort transaction before starting syscall.
* powerpc: Handle worstcase behavior in strstr() for POWER7Rajalakshmi Srinivasaraghavan2015-08-252-7/+19
| | | | | | | | | | Instead of checking needle length, constant 'n' number of comparisons is checked to fall back to default implementation. This patch is tested on powerpc64 and powerpc64le. 2015-08-25 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc64/power7/strstr.S: Handle worst case.
* Replace %xmm[8-12] with %xmm[0-4]H.J. Lu2015-08-252-47/+51
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm[0-4] to avoid the REX prefix. * sysdeps/x86_64/strlen.S: Replace %xmm[8-12] with %xmm[0-4].
* Remove x86-64 rtld-xxx.c and rtld-xxx.SH.J. Lu2015-08-257-464/+9
| | | | | | | | | | | | Since ld.so preserves vector registers now, we can use the regular, non-ifunc string and memory functions in ld.so. * sysdeps/x86_64/rtld-memcmp.c: Removed. * sysdeps/x86_64/rtld-memset.S: Likewise. * sysdeps/x86_64/rtld-strchr.S: Likewise. * sysdeps/x86_64/rtld-strlen.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memcmp.c: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
* Replace %xmm8 with %xmm0H.J. Lu2015-08-252-26/+30
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm0 to avoid the REX prefix. * sysdeps/x86_64/memset.S: Replace %xmm8 with %xmm0.
* add bug 18240 to news.Ondřej Bílka2015-08-251-2/+3
|
* Handle overflow in __hcreate_rOndřej Bílka2015-08-252-1/+18
| | | | | | | | | | Hi, As in bugzilla entry there is overflow in hsearch when looking for prime number as SIZE_MAX - 1 is divisible by 5. We fix that by rejecting large inputs before looking for prime. * misc/hsearch_r.c (__hcreate_r): Handle overflow.
* Fix strcpy_chk and stpcpy_chk performance.Ondřej Bílka2015-08-254-266/+11
| | | | | | | | | | | | | | | Hi, as I wrote in previous patches a performance of checked strcpy and stpcpy is terrible as these don't use sse2 and are around four times slower that strcpy and stpcpy now. As this bug shows that these functions are not performance sensitive I decided just to improve generic implementation instead for easier maintainance. * debug/strcpy_chk.c: Improve performance. * debug/stpcpy_chk.c: Likewise. * sysdeps/x86_64/strcpy_chk.S: Remove. * sysdeps/x86_64/stpcpy_chk.S: Remove.
* Save and restore vector registers in x86-64 ld.soH.J. Lu2015-08-259-501/+499
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve and _dl_runtime_profile, which save and restore the first 8 vector registers used for parameter passing. elf_machine_runtime_setup selects the proper _dl_runtime_resolve or _dl_runtime_profile based on _dl_x86_cpu_features. It avoids race condition caused by FOREIGN_CALL macros, which are only used for x86-64. Performance impact of saving and restoring 8 vector registers are negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when ld.so is optimized with SSE2. [BZ #15128] * sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add ifuncmain8. (modules-names): Add ifuncmod8. ($(objpfx)ifuncmain8): New rule. * sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and <cpuid.h>. (elf_machine_runtime_setup): Use _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512, _dl_runtime_profile_sse, _dl_runtime_profile_avx, or _dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE. * sysdeps/x86_64/dl-trampoline.S: Rewrite. * sysdeps/x86_64/dl-trampoline.h: Likewise. * sysdeps/x86_64/ifuncmain8.c: New file. * sysdeps/x86_64/ifuncmod8.c: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE): Removed. * sysdeps/x86_64/nptl/tls.h (__128bits): Removed. (tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1. Change rtld_savespace_sse to __glibc_unused2. (RTLD_CHECK_FOREIGN_CALL): Removed. (RTLD_ENABLE_FOREIGN_CALL): Likewise. (RTLD_PREPARE_FOREIGN_CALL): Likewise. (RTLD_FINALIZE_FOREIGN_CALL): Likewise.
* Note bug 10882 as having been fixed in 2.16.Joseph Myers2015-08-242-18/+19
|
* 2015-08-24 Wilco Dijkstra <wdijkstr@arm.com>Wilco Dijkstra2015-08-242-27/+4
| | | | * sysdeps/aarch64/bzero.S (__bzero): Remove.
* 2015-08-24 Wilco Dijkstra <wdijkstr@arm.com>Wilco Dijkstra2015-08-242-8/+10
| | | | | | * sysdeps/aarch64/fpu/math_private.h (libc_feholdsetround_aarch64_ctx): Unconditionally set __fpcr to avoid uninialized warning. (libc_feholdsetround_noex_aarch64_ctx): Likewise.
* Don't use the main arena in retry path if it is corruptSiddhesh Poyarekar2015-08-242-0/+7
| | | | | | | | | If allocation on a non-main arena fails, the main arena is used without checking to see if it is corrupt. Add a check that avoids the main arena if it is corrupt. * malloc/arena.c (arena_get_retry): Don't use main_arena if it is corrupt.
* Drop unused first argument from arena_get2Siddhesh Poyarekar2015-08-242-5/+9
| | | | | | | | | | | | The arena pointer in the first argument to arena_get2 was used in the old days before per-thread arenas. They're unused now and hence can be dropped. ChangeLog: * malloc/arena.c (arena_get2): Drop unused argument. (arena_lock): Adjust. (arena_get_retry): Likewise.
* Remove __ASSUME_IPC64Andreas Schwab2015-08-2424-706/+29
| | | | | | | PowerPC has always used __IPC_64 like most other architectures, which means that __ASSUME_IPC64 can be always true. Also, all other architecture implementations that use the ipc syscall are effectively identical to the generic version and can be removed.
* manual: skip build when perl is unavailableMike Frysinger2015-08-212-0/+7
| | | | | | | Do not try to generate the manual when perl is unavailable. This matches the behavior when makeinfo is unavailable. Otherwise the install step fails when trying to generate the libm section since it runs a perl script.
* powerpc: Fix memchr for powerpc32.Carlos Eduardo Seo2015-08-212-1/+6
| | | | | | | Fix a wrong #undef in memchr.c. * sysdeps/powerpc/powerpc32/power4/multiarch/memchr.c: Replace '#undef memcpy' by '#undef memchr'.
* powerpc: make memchr use memchr-power7.Carlos Eduardo Seo2015-08-212-1/+18
| | | | | | | | | In powerpc64, memchr was always pointing to the internal __GI_memchr implementation. This patch fixes that and makes it use the optimized POWER7 version when adequate. * sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c: Make memchr not point to the internal __GI_memchr implementation.