about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* localedata: i18n: fix typos in tel_int_fmtMike FABIAN2016-04-082-1/+5
| | | | | Adding the %t avoids a double space if the area code %a happens to be empty. There are countries without area codes.
* Fix termios.h XCASE namespace (bug 19925).Joseph Myers2016-04-087-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | bits/termios.h (various versions under sysdeps/unix/sysv/linux) defines XCASE if defined __USE_MISC || defined __USE_XOPEN. This macro was removed in the 2001 edition of POSIX, and is not otherwise reserved, so should not be defined for 2001 and later versions of POSIX. This patch fixes the conditions accordingly (leaving the macro defined for __USE_MISC, so still in the default namespace). Tested for x86_64 and x86 (testsuite, and that installed shared libraries are unchanged by the patch). [BZ #19925] * sysdeps/unix/sysv/linux/alpha/bits/termios.h (XCASE): Do not define if [!__USE_MISC && __USE_XOPEN2K]. * sysdeps/unix/sysv/linux/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/mips/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/powerpc/bits/termios.h (XCASE): Likewise. * sysdeps/unix/sysv/linux/sparc/bits/termios.h (XCASE): Likewise. * conform/Makefile (test-xfail-XOPEN2K/termios.h/conform): Remove variable. (test-xfail-XOPEN2K8/termios.h/conform): Likewise.
* powerpc: Add optimized P8 strspnPaul E. Murphy2016-04-077-1/+304
| | | | | | | This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.
* hsearch_r: Include <limits.h>Florian Weimer2016-04-072-0/+5
| | | | It is needed for UINT_MAX.
* scratch_buffer_set_array_size: Include <limits.h>Florian Weimer2016-04-072-0/+5
| | | | It is needed for CHAR_BIT.
* X86-64: Prepare memmove-vec-unaligned-erms.SH.J. Lu2016-04-062-54/+95
| | | | | | | | | | | | | | Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so.
* X86-64: Prepare memset-vec-unaligned-erms.SH.J. Lu2016-04-062-13/+28
| | | | | | | | | | | | Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols.
* Add memcpy/memmove/memset benchmarks with large dataH.J. Lu2016-04-066-2/+393
| | | | | | | | | | | Add memcpy, memmove and memset benchmarks with large data sizes. * benchtests/Makefile (string-benchset): Add memcpy-large, memmove-large and memset-large. * benchtests/bench-memcpy-large.c: New file. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-string.h (TIMEOUT): Don't redefine.
* Mention Bug in ChangeLog for S390: Save and restore fprs/vrs while resolving ↵Stefan Liebler2016-04-061-0/+1
| | | | | | | symbols. The Bugzilla 19916 is added to the ChangeLog for commit 4603c51ef7989d7eb800cdd6f42aab206f891077.
* Force 32-bit displacement in memset-vec-unaligned-erms.SH.J. Lu2016-04-052-0/+18
| | | | | * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions.
* Add a comment in memset-sse2-unaligned-erms.SH.J. Lu2016-04-052-0/+7
| | | | | * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA.
* strfmon_l: Use specified locale for number formatting [BZ #19633]Florian Weimer2016-04-047-25/+301
|
* Don't put SSE2/AVX/AVX512 memmove/memset in ld.soH.J. Lu2016-04-037-32/+51
| | | | | | | | | | | | | | Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise.
* Fix memmove-vec-unaligned-erms.SH.J. Lu2016-04-032-24/+39
| | | | | | | | | | | | | | __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first.
* Remove Fast_Copy_Backward from Intel Core processorsH.J. Lu2016-04-012-5/+6
| | | | | | | | | Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.
* Use PTR_ALIGN_DOWN on strcspn and strspnAdhemerval Zanella2016-04-013-2/+10
| | | | | | | Tested on aarch64. * string/strcspn.c (strcspn): Use PTR_ALIGN_DOWN. * string/strspn.c (strspn): Likewise.
* Test 64-byte alignment in memset benchtestH.J. Lu2016-04-012-1/+12
| | | | | | | | | Add 64-byte alignment tests in memset benchtest for 64-byte vector registers. * benchtests/bench-memset.c (do_test): Support 64-byte alignment. (test_main): Test 64-byte alignment.
* Test 64-byte alignment in memmove benchtestH.J. Lu2016-04-012-0/+13
| | | | | | | | Add 64-byte alignment tests in memmove benchtest for 64-byte vector registers. * benchtests/bench-memmove.c (test_main): Test 64-byte alignment.
* Test 64-byte alignment in memcpy benchtestH.J. Lu2016-04-012-0/+12
| | | | | | | Add 64-byte alignment tests in memcpy benchtest for 64-byte vector registers. * benchtests/bench-memcpy.c (test_main): Test 64-byte alignment.
* Remove powerpc64 strspn, strcspn, and strpbrk implementationAdhemerval Zanella2016-04-014-406/+4
| | | | | | | | | | | | | This patch removes the powerpc64 optimized strspn, strcspn, and strpbrk assembly implementation now that the default C one implements the same strategy. On internal glibc benchtests current implementations shows similar performance with -O2. Tested on powerpc64le (POWER8). * sysdeps/powerpc/powerpc64/strcspn.S: Remove file. * sysdeps/powerpc/powerpc64/strpbrk.S: Remove file. * sysdeps/powerpc/powerpc64/strspn.S: Remove file.
* Improve generic strpbrk performanceAdhemerval Zanella2016-04-014-68/+36
| | | | | | | | | | | | | | | | | | | | | | | With now a faster strcspn implementation, it is faster to just use it with some return tests than reimplementing strpbrk itself. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i386, and aarch64. * string/strpbrk.c (strpbrk): Rewrite function. * string/bits/string2.h (strpbrk): Use __builtin_strpbrk. (__strpbrk_c2): Likewise. (__strpbrk_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c3): Likewise.
* Improve generic strspn performanceAdhemerval Zanella2016-04-014-86/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | As for strcspn, this patch improves strspn performance using a much faster algorithm. It first constructs a 256-entry table based on the accept string and then uses it as a lookup table for the input string. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i686, and aarch64. * string/strspn.c (strcspn): Rewrite function. * string/bits/string2.h (strspn): Use __builtin_strcspn. (__strspn_c1): Remove inline function. (__strspn_c2): Likewise. (__strspn_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c3): Likewise.
* Improve generic strcspn performanceWilco Dijkstra2016-04-016-96/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve strcspn performance using a much faster algorithm. It is kept simple so it works well on most targets. It is generally at least 10 times faster than the existing implementation on bench-strcspn on a few AArch64 implementations, and for some tests 100 times as fast (repeatedly calling strchr on a small string is extremely slow...). In fact the string/bits/string2.h inlines make no longer sense, as GCC already uses strlen if reject is an empty string, strchrnul is 5 times as fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than the strcspn main loop for large strings (though reject length 2-4 could be special cased in the future to gain even more performance). Tested on x86_64, i686, and aarch64. * string/Version (libc): Add GLIBC_2.24. * string/strcspn.c (strcspn): Rewrite function. * string/bits/string2.h (strcspn): Use __builtin_strcspn. (__strcspn_c1): Remove inline function. (__strcspn_c2): Likewise. (__strcspn_c3): Likewise. * string/string-inline.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c3): Likewise. * sysdeps/i386/string-inlines.c: Include generic string-inlines.c.
* S390: Use ahi instead of aghi in 32bit _dl_runtime_resolve.Stefan Liebler2016-04-012-1/+6
| | | | | | | | | | | This patch uses ahi instead of aghi in 32bit _dl_runtime_resolve to adjust the stack pointer. This is no functional change, but a cosmetic one. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_resolve): Use ahi instead of aghi to adjust stack pointer.
* Increase internal precision of ldbl-128ibm decimal printf [BZ #19853]Paul E. Murphy2016-03-313-11/+42
| | | | | | | | | | | | | | When the signs differ, the precision of the conversion sometimes drops below 106 bits. This strategy is identical to the hexadecimal variant. I've refactored tst-sprintf3 to enable testing a value with more than 30 significant digits in order to demonstrate this failure and its solution. Additionally, this implicitly fixes a typo in the shift quantities when subtracting from the high mantissa to compute the difference.
* Add x86-64 memset with unaligned store and rep stosbH.J. Lu2016-03-317-1/+358
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
* Add x86-64 memmove with unaligned load/store and rep movsbH.J. Lu2016-03-317-1/+634
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise.
* S390: Extend structs La_s390_regs / La_s390_retval with vector-registers.Stefan Liebler2016-03-314-65/+136
| | | | | | | | | | | | | | | | | | | | | Starting with z13, vector registers can also occur as argument registers. Thus the passed input/output register structs for la_s390_[32|64]_gnu_plt[enter|exit] functions should reflect those new registers. This patch extends these structs La_s390_regs and La_s390_retval and adjusts _dl_runtime_profile() to handle those fields in case of running on a z13 machine. ChangeLog: * sysdeps/s390/bits/link.h: (La_s390_vr) New typedef. (La_s390_32_regs): Append vector register lr_v24-lr_v31. (La_s390_64_regs): Likewise. (La_s390_32_retval): Append vector register lrv_v24. (La_s390_64_retval): Likeweise. * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_32_regs and La_s390_32_retval. * sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_64_regs and La_s390_64_retval.
* S390: Save and restore fprs/vrs while resolving symbols.Stefan Liebler2016-03-317-248/+516
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On s390, no fpr/vrs were saved while resolving a symbol via _dl_runtime_resolve/_dl_runtime_profile. According to the abi, the fpr-arguments are defined as call clobbered. In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs instead of saving them to the stack. If gcc do this in one of the resolver-functions, then the floating point arguments of a library-function are invalid for the first library-function-call. Thus, this patch saves/restores the fprs around the resolving code. The same could occur for vector registers. Furthermore an ifunc-resolver could also clobber the vector/floating point argument registers. Thus this patch provides the further variants _dl_runtime_resolve_vx/ _dl_runtime_profile_vx, which are used if the kernel claims, that we run on a machine with vector registers. Furthermore, if _dl_runtime_profile calls _dl_call_pltexit, the pointers to inregs-/outregs-structs were setup invalid. Now they point to the correct location in the stack-frame. Before branching back to the caller, the return values are now restored instead of containing the return values of the _dl_call_pltexit() call. On s390-32, an endless loop occurs if _dl_call_pltexit() should be called. Now, this code-path branches to this function instead of just after the preceding basr-instruction. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-32/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues if _dl_call_pltexit is called. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx. * sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-64/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues * sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx.
* Fix tst-dlsym-error buildAdhemerval Zanella2016-03-312-0/+5
| | | | | | | | This patch fixes the new test tst-dlsym-error build on aarch64 (and possible other architectures as well) due missing strchrnul definition. * elf/tst-dlsym-error.c: Include <string.h> for strchrnul.
* Report dlsym, dlvsym lookup errors using dlerror [BZ #19509]Florian Weimer2016-03-314-2/+125
| | | | | | | | * elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if skip_map != NULL. * elf/tst-dlsym-error.c: New file. * elf/Makefile (tests): Add tst-dlsym-error. (tst-dlsym-error): Link against libdl.
* [microblaze] Remove __ASSUME_FUTIMESAT.Joseph Myers2016-03-293-33/+6
| | | | | | | | | | | | | | | MicroBlaze has a special version of futimesat.c because it gained the futimesat syscall later than other non-asm-generic architectures. Now the minimum kernel is recent enough that this syscall can always be assumed to be present for MicroBlaze, so this patch removes the special version and the __ASSUME_FUTIMESAT macro, resulting in the sysdeps/unix/sysv/linux/futimesat.c version being used. Untested. * sysdeps/unix/sysv/linux/microblaze/kernel-features.h (__ASSUME_FUTIMESAT): Remove macro. * sysdeps/unix/sysv/linux/microblaze/futimesat.c: Remove file.
* CVE-2016-3075: Stack overflow in _nss_dns_getnetbyname_r [BZ #19879]Florian Weimer2016-03-292-4/+8
| | | | | The defensive copy is not needed because the name may not alias the output buffer.
* nss_db: Propagate ERANGE error if parse_line fails [BZ #19837]Florian Weimer2016-03-292-2/+8
| | | | | | | | | | | | | | | | | Reproducer (needs to run as root): perl -e \ 'print "large:x:999:" . join(",", map {"user$_"} (1 .. 135))."\n"' \ >> /etc/group cd /var/db make getent -s db group After the fix, the last command should list the "large" group. The magic number 135 has been chosen so that the line is shorter than 1024 bytes, but the pointers required to encode the member array will cross the threshold, triggering the bug.
* Initial Enhanced REP MOVSB/STOSB (ERMS) supportH.J. Lu2016-03-282-0/+10
| | | | | | | | | | The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise.
* Synchronize <sys/personality.h> with kernel headersAurelien Jarno2016-03-282-0/+8
| | | | | | | | | <sys/personality.h> is out of sync with kernel headers, missing the UNAME26, FDPIC_FUNCPTRS and PER_LINUX_FDPIC entries. Fix that. Changelog: * sysdeps/unix/sysv/linux/sys/personality.h (UNAME26, FDPIC_FUNCPTRS, PER_LINUX_FDPIC): Add.
* Make __memcpy_avx512_no_vzeroupper an aliasH.J. Lu2016-03-284-430/+426
| | | | | | | | | | | | | | | | | | | | | | | | | | Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise.
* Implement x86-64 multiarch mempcpy in memcpyH.J. Lu2016-03-2810-57/+91
| | | | | | | | | | | | | | | | | | | | | | | | | Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.
* [x86] Add a feature bit: Fast_Unaligned_CopyH.J. Lu2016-03-284-2/+31
| | | | | | | | | | | | | | | | | | | On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
* resolv: Always set *resplen2 out parameter in send_dg [BZ #19791]Florian Weimer2016-03-252-23/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 44d20bca52ace85850012b0ead37b360e3ecd96e (Implement second fallback mode for DNS requests), there is a code path which returns early, before *resplen2 is initialized. This happens if the name server address is immediately recognized as invalid (because of lack of protocol support, or if it is a broadcast address such 255.255.255.255, or another invalid address). If this happens and *resplen2 was non-zero (which is the case if a previous query resulted in a failure), __libc_res_nquery would reuse an existing second answer buffer. This answer has been previously identified as unusable (for example, it could be an NXDOMAIN response). Due to the presence of a second answer, no name server switching will occur. The result is a name resolution failure, although a successful resolution would have been possible if name servers have been switched and queries had proceeded along the search path. The above paragraph still simplifies the situation. Before glibc 2.23, if the second answer needed malloc, the stub resolver would still attempt to reuse the second answer, but this is not possible because __libc_res_nsearch has freed it, after the unsuccessful call to __libc_res_nquerydomain, and set the buffer pointer to NULL. This eventually leads to an assertion failure in __libc_res_nquery: /* Make sure both hp and hp2 are defined */ assert((hp != NULL) && (hp2 != NULL)); If assertions are disabled, the consequence is a NULL pointer dereference on the next line. Starting with glibc 2.23, as a result of commit e9db92d3acfe1822d56d11abcea5bfc4c41cf6ca (CVE-2015-7547: getaddrinfo() stack-based buffer overflow (Bug 18665)), the second answer is always allocated with malloc. This means that the assertion failure happens with small responses as well because there is no buffer to reuse, as soon as there is a name resolution failure which triggers a search for an answer along the search path. This commit addresses the issue by ensuring that *resplen2 is initialized before the send_dg function returns. This commit also addresses a bug where an invalid second reply is incorrectly returned as a valid to the caller.
* tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860]Florian Weimer2016-03-252-1/+10
| | | | | | [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit.
* Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848).Joseph Myers2016-03-247-21/+2628
| | | | | | | | | | | | | | | | | | | | | | | Bug 19848 reports cases where powl on x86 / x86_64 has error accumulation, for small integer exponents, larger than permitted by glibc's accuracy goals, at least in some rounding modes. This patch further restricts the exponent range for which the small-integer-exponent logic is used to limit the possible error accumulation. Tested for x86_64 and x86 and ulps updated accordingly. [BZ #19848] * sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value from 8 to 4. (__ieee754_powl): Compare integer exponent against 4 not 8. * sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value from 8 to 4. (__ieee754_powl): Compare integer exponent against 4 not 8. * math/auto-libm-test-in: Add more tests of pow. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Assume __NR_utimensat is always definedAurelien Jarno2016-03-234-22/+12
| | | | | | | | | | | | | | | | With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures, __NR_utimensat is always defined. Changelog: * sysdeps/unix/sysv/linux/futimens.c (futimens) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code. * sysdeps/unix/sysv/linux/lutimes.c (lutimes) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code. * sysdeps/unix/sysv/linux/utimensat.c (utimensat) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code.
* Assume __NR_openat is always definedAurelien Jarno2016-03-233-4/+6
| | | | | | | | | With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures, __NR_openat is always defined. Changelog: * sysdeps/unix/sysv/linux/dl-openat64.c (openat64) [__NR_openat]: Make code unconditional.
* x86, pthread_cond_*wait: Do not depend on %eax not being clobberedNick Alcock2016-03-233-0/+8
| | | | | | | | | | | | | | | | | | | | | | The x86-specific versions of both pthread_cond_wait and pthread_cond_timedwait have (in their fall-back-to-futex-wait slow paths) calls to __pthread_mutex_cond_lock_adjust followed by __pthread_mutex_unlock_usercnt, which load the parameters before the first call but then assume that the first parameter, in %eax, will survive unaffected. This happens to have been true before now, but %eax is a call-clobbered register, and this assumption is not safe: it could change at any time, at GCC's whim, and indeed the stack-protector canary checking code clobbers %eax while checking that the canary is uncorrupted. So reload %eax before calling __pthread_mutex_unlock_usercnt. (Do this unconditionally, even when stack-protection is not in use, because it's the right thing to do, it's a slow path, and anything else is dicing with death.) * sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S: Reload call-clobbered %eax on retry path. * sysdeps/unix/sysv/linux/i386/pthread_cond_wait.S: Likewise.
* Don't set %rcx twice before "rep movsb"H.J. Lu2016-03-222-1/+5
| | | | | * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb".
* Set index_arch_AVX_Fast_Unaligned_Load only for Intel processorsH.J. Lu2016-03-223-74/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
* Fix malloc threaded tests link on non-LinuxSamuel Thibault2016-03-222-6/+9
| | | | | | * malloc/Makefile ($(objpfx)tst-malloc-backtrace, $(objpfx)tst-malloc-thread-exit, $(objpfx)tst-malloc-thread-fail): Use $(shared-thread-library) instead of hardcoding the path to libpthread.
* Remove __ASSUME_GETDENTS64_SYSCALL.Joseph Myers2016-03-223-114/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the __ASSUME_GETDENTS64_SYSCALL macro, as its definition is constant given the new kernel version requirements (and was constant anyway before those requirements except for MIPS n32). Note that the "#ifdef __NR_getdents64" conditional *is* still needed, because MIPS n64 only has the getdents syscall (being a 64-bit ABI, that syscall is 64-bit; the difference between the two on 64-bit architectures is where d_type goes). If MIPS n64 were to gain the getdents64 syscall and we wanted to use it conditionally on the kernel version at runtime we'd have to revert this patch, but I think that's unlikely (and in any case, we could follow the simpler approach of undefining __NR_getdents64 if the syscall can't be assumed, just like we do for accept4 / recvmmsg / sendmmsg syscalls on architectures where socketcall support came first). Most of the getdents.c changes are reindentation. Tested for x86_64 and x86 that installed stripped shared libraries are unchanged by the patch. * sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_GETDENTS64_SYSCALL): Remove macro. * sysdeps/unix/sysv/linux/getdents.c [!__ASSUME_GETDENTS64_SYSCALL]: Remove conditional code. [!have_no_getdents64_defined]: Likewise. (__GETDENTS): Remove __have_no_getdents64 conditional.
* Remove __ASSUME_SIGNALFD4.Joseph Myers2016-03-213-26/+10
| | | | | | | | | | | | | | | | Current Linux kernel version requirements mean the signalfd4 syscall can always be assumed to be available. This patch removes __ASSUME_SIGNALFD4 and associated conditionals. Tested for x86_64 and x86 that installed stripped shared libraries are unchanged by the patch. * sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_SIGNALFD4): Remove macro. * sysdeps/unix/sysv/linux/signalfd.c: Do not include <kernel-features.h>. (signalfd) [__NR_signalfd4]: Make code unconditional. (signalfd) [!__ASSUME_SIGNALFD4]: Remove conditional code.