mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86-64: Implement memcmp family IFUNC selectors in C	H.J. Lu	2017-06-15	7	-113/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement memcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-sse2. * sysdeps/x86_64/multiarch/ifunc-memcmp.h: New file. * sysdeps/x86_64/multiarch/memcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Removed. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
*	x86-64: Implement memset family IFUNC selectors in C	H.J. Lu	2017-06-15	12	-147/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement memset family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memset functions within libc. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> Erich Elsen <eriche@google.com> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, and memset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add test for __memset_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memset.h: New file. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.c: Likewise. * sysdeps/x86_64/multiarch/memset.S: Removed. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_chk_erms): New function.
*	x86-64: Implement memmove family IFUNC selectors in C	H.J. Lu	2017-06-14	20	-474/+418
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement memmove family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memmove family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memcpy_chk-nonshared, mempcpy_chk-nonshared and memmove_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memmove_chk_erms, __memcpy_chk_erms and __mempcpy_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memmove.h: New file. * sysdeps/x86_64/multiarch/memcpy.c: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Removed. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_chk_erms): New function. (__memmove_chk_erms): Likewise. (__memcpy_chk_erms): New alias.
*	i686: Add missing IS_IN (libc) guards to vectorized strcspn	Florian Weimer	2017-06-14	2	-3/+7
\| \| \| \| \| \| \| \| \|	Since commit d957c4d3fa48d685ff2726c605c988127ef99395 (i386: Compile rtld-*.os with -mno-sse -mno-mmx -mfpmath=387), vector intrinsics can no longer be used in ld.so, even if the compiled code never makes it into the final ld.so link. This commit adds the missing IS_IN (libc) guard to the SSE 4.2 strcspn implementation, so that it can be used from ld.so in the future.
*	Remove __need macros from errno.h (__need_Emath, __need_error_t).	Zack Weinberg	2017-06-14	13	-532/+606
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is fairly complicated, not because the users of __need_Emath and __need_error_t have complicated requirements, but because the core changes had a lot of fallout. __need_error_t exists for gnulib compatibility in argz.h and argp.h. error_t itself is a Hurdism, an enum containing all the E-constants, so you can do 'p (error_t) errno' in gdb and get a symbolic value. argz.h and argp.h use it for function return values, and they want to fall back to 'int' when that's not available. There is no reason why these nonstandard headers cannot just go ahead and include all of errno.h; so we do that. __need_Emath is defined only by .S files; what they _really_ need is for errno.h to avoid declaring anything other than the E-constants (e.g. 'extern int __errno_location(void);' is a syntax error in assembly language). This is replaced with a check for __ASSEMBLER__ in errno.h, plus a carefully documented requirement for bits/errno.h not to define anything other than macros. That in turn has the consequence that bits/errno.h must not define errno - fortunately, all live ports use the same definition of errno, so I've moved it to errno.h. The Hurd bits/errno.h must also take care not to define error_t when __ASSEMBLER__ is defined, which involves repeating all of the definitions twice, but it's a generated file so that's okay. * stdlib/errno.h: Remove __need_Emath and __need_error_t logic. Reorganize file. Declare errno here. When __ASSEMBLER__ is defined, don't declare anything other than the E-constants. * include/errno.h: Change conditional for exposing internal declarations to (not _ISOMAC and not __ASSEMBLER__). * bits/errno.h: Remove logic for __need_Emath. Document requirements for a port-specific bits/errno.h. * sysdeps/unix/sysv/linux/bits/errno.h * sysdeps/unix/sysv/linux/alpha/bits/errno.h * sysdeps/unix/sysv/linux/hppa/bits/errno.h * sysdeps/unix/sysv/linux/mips/bits/errno.h * sysdeps/unix/sysv/linux/sparc/bits/errno.h: Add multiple-include guard and check against improper inclusion. Remove __need_Emath logic. Don't declare errno here. Ensure all constants are defined as simple integer literals. Consistent formatting. * sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and enum __error_t_codes if __ASSEMBLER__ is not defined. * sysdeps/mach/hurd/bits/errno.h: Regenerate. * argp/argp.h, string/argz.h: Don't define __need_error_t before including errno.h. * sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S * sysdeps/x86_64/fpu/s_cosf.S * sysdeps/x86_64/fpu/s_sincosf.S * sysdeps/x86_64/fpu/s_sinf.S: Just include errno.h; don't define __need_Emath or include bits/errno.h directly.
*	Remove __need_IOV_MAX and __need_FOPEN_MAX.	Zack Weinberg	2017-06-14	3	-61/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__need_FOPEN_MAX wasn't being used anywhere. __need_IOV_MAX was more complicated; the basic deal is that sys/uio.h wants to define a constant named UIO_MAXIOV and bits/xopen_lim.h wants to define a constant named IOV_MAX, with the same meaning. For no apparent reason this was being handled via bits/stdio_lim.h -- stdio.h is NOT supposed to define IOV_MAX -- and some mess in Makerules. Also, bits/uio.h on Linux was being used as a dumping ground for extension functions. So now we have bits/uio_lim.h, which defines __IOV_MAX. bits/xopen_lim.h and sys/uio.h use that to define their respective constants. We also now have bits/uio-ext.h, which is the official Proper Home for extensions to sys/uio.h. bits/uio.h is removed, and stdio_lim.h doesn't define IOV_MAX at all. * bits/uio_lim.h, sysdeps/unix/sysv/linux/bits/uio_lim.h * bits/uio-ext.h, sysdeps/unix/sysv/linux/bits/uio-ext.h: New file. * bits/uio.h, sysdeps/unix/sysv/linux/bits/uio.h: Delete file. * include/bits/xopen_lim.h: Use bits/uio_lim.h to get the value for IOV_MAX. * misc/Makefile: Install bits/uio-ext.h and bits/uio_lim.h. Don't install bits/uio.h. * misc/sys/uio.h: Don't include bits/uio.h. Do include bits/types/struct_iovec.h and bits/uio_lim.h. Set UIO_MAXIOV based on __IOV_MAX. Under __USE_GNU, also include bits/uio-ext.h. * stdio-common/stdio_lim.h.in: Remove logic for __need_FOPEN_MAX and __need_IOV_MAX. Don't define IOV_MAX at all. * Makerules (stdio_lim.h): Remove logic for setting IOV_MAX. * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: Include bits/types/struct_iovec.h, not bits/uio.h. Use __ssize_t, not ssize_t, in function prototypes. Don't use hard TAB for double space after period in comments.
*	PowerPC64 ELFv2 PPC64_OPT_LOCALENTRY	Alan Modra	2017-06-14	21	-35/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, that have no requirement on r2 or r12 and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. This patch implements the ld.so changes necessary for this optimization. ld.so needs to check that an optimized plt call sequence is in fact calling a function implemented with localentry:0, end emit a fatal error otherwise. The elf/testobj6.c change is to stop "error while loading shared libraries: expected localentry:0 `preload'" when running elf/preloadtest, which we'd get otherwise. * elf/elf.h (PPC64_OPT_LOCALENTRY): Define. * sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add refsym and sym parameters. Adjust callers. * sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New. (_dl_reloc_overflow): Increase buffser size. Formatting. * sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset): Delete reloc param, add refsym and sym. Check optimized plt call stubs for localentry:0 functions. Adjust callers. (elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym and sym parameters. Adjust callers. (_dl_reloc_overflow): Move attribute. (_dl_error_localentry): Declare. * elf/dl-runtime.c (_dl_fixup): Save original sym. Pass refsym and sym to elf_machine_fixup_plt. * elf/testobj6.c (preload): Call printf.
*	PowerPC64 ENTRY_TOCLESS	Alan Modra	2017-06-14	102	-138/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A number of functions in the sysdeps/powerpc/powerpc64/ tree don't use or change r2, yet declare a global entry that sets up r2. This patch fixes that problem, and consolidates the ENTRY and EALIGN macros. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (NOPS, ENTRY_3): New macros. (ENTRY): Rewrite. (ENTRY_TOCLESS): Define. (EALIGN, EALIGN_W_0, EALIGN_W_1, EALIGN_W_2, EALIGN_W_4, EALIGN_W_5, EALIGN_W_6, EALIGN_W_7, EALIGN_W_8): Delete. * sysdeps/powerpc/powerpc64/a2/memcpy.S: Replace EALIGN with ENTRY. * sysdeps/powerpc/powerpc64/dl-trampoline.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_cosf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_sinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcasestr.S: Likewise. * sysdeps/powerpc/powerpc64/addmul_1.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/powerpc64/cell/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_fabsl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/lshift.S: Likewise. * sysdeps/powerpc/powerpc64/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/mul_1.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power7/add_n.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise. * sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcasecmp.S (strcasecmp_l): Likewise. * sysdeps/powerpc/powerpc64/power7/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strchrnul.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strspn.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/ppc-mcount.S: Store LR earlier. Don't add nop when SHARED. * sysdeps/powerpc/powerpc64/start.S: Fix comment. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY): Don't define. (ENTRY_TOCLESS): Define. * sysdeps/powerpc/powerpc32/sysdep.h (ENTRY_TOCLESS): Define. * sysdeps/powerpc/fpu/s_fma.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
*	PowerPC64 strncpy, stpncpy and strstr fixes	Alan Modra	2017-06-14	8	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Makes __stpncpy_power8 call __memset_power8 directly rather than via an IFUNC. Fixes a missing _mcount, and removes some redundant NOPS. The _is_local defines are also used in a followup patch. sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Define MEMSET_is_local. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise. Define MEMSET. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define STRLEN_is_local, STRNLEN_is_local, and STRCHR_is_local. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. Don't add nop after local calls. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Define MEMSET_is_local. Don't add nop after local call. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. Add missing CALL_MCOUNT.
*	PowerPC64 sysdep.h tidy	Alan Modra	2017-06-14	2	-59/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	.align on some targets takes a byte alignment, on others like powerpc, log2 of the byte alignment. It's a good idea to avoid .align, particularly since x86 and powerpc are different. This patch fixes the occurrences of .align in powerpc64/sysdep.h, renames DOT_LABEL since the macro doesn't have anything to do with adding dots, removes extraneous semicolons, and fixes some formatting. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (FUNC_LABEL): Rename from DOT_LABEL. (ENTRY_1): Use FUNC_LABEL and remove leading space from label. Use .p2align rather than .align. (TRACEBACK, TRACEBACK_MASK): Use .p2align rather than .align. (ABORT_TRANSACTION): Likewise. (ENTRY_1, ENTRY_2, END_2, LOCALENTRY): Remove unnecessary semicolons, particularly at end. Add semicolon at invocation as necessary. (TRACEBACK, TRACEBACK_MASK, PSEUDO, PSEUDO_NOERRNO): Likewise. (PSEUDO_ERRVAL, PPC64_LOAD_FUNCPTR, OPD_ENT): Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY, END): Adjust to suit.
*	PowerPC64 FRAME_PARM_SAVE	Alan Modra	2017-06-14	2	-37/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	I think FRAME_PARM[1-9]_SAVE confuse the code, particularly FRAME_PARM9_SAVE. There are only 8 parameter save slots! * sysdeps/powerpc/powerpc64/sysdep.h: (FRAME_BACKCHAIN, FRAME_CR_SAVE, FRAME_LR_SAVE): Move out of conditional. (FRAME_PARM1_SAVE, FRAME_PARM2_SAVE, FRAME_PARM3_SAVE, FRAME_PARM4_SAVE, FRAME_PARM5_SAVE, FRAME_PARM6_SAVE, FRAME_PARM7_SAVE, FRAME_PARM8_SAVE, FRAME_PARM9_SAVE): Delete. * sysdeps/unix/sysv/linux/powerpc/powerpc64/makecontext.S: Replace uses of FRAME_PARM[1-9]_SAVE with FRAME_PARM_SAVE plus offset.
*	PowerPC64, fix calls to _mcount	Alan Modra	2017-06-14	1	-8/+3
\| \| \| \| \| \| \|	The macros used in assembly were broken on powerpc64 ELFv1. * sysdeps/powerpc/powerpc64/sysdep.h: (call_mcount_parm_offset): Delete. (SAVE_ARG, REST_ARG, CFI_SAVE_ARG): Correct.
*	mips: Fix store/load gp registers to/from ucontext_t	Gordana Cmiljanovic	2017-06-13	6	-81/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	General purpose registers in mcontext_t structure are 8 bytes long for both MIPS32/MIPS64. get/set/make/swap context implementations for MIPS O32 incorrectly assume that general purpose registers in this structure are 4 bytes long. This patch is fixing that. Tested for MIPS O32 LE and BE. Compared objdump of modified functions for mips n32 and mips n64. [BZ #21548] * sysdeps/unix/sysv/linux/mips/getcontext.S: Define MCONTEXT_SZGREG as 8 and use it when copying general purpose registers. * sysdeps/unix/sysv/linux/mips/makecontext.S: Likewise. * sysdeps/unix/sysv/linux/mips/mips32/Makefile: Include new test for mips o32. * sysdeps/unix/sysv/linux/mips/mips32/bug-getcontext-mips-gp.c: Added new test for mips o32. * sysdeps/unix/sysv/linux/mips/setcontext.S: Define MCONTEXT_SZGREG as 8 and use it when copying general purpose registers. * sysdeps/unix/sysv/linux/mips/swapcontext.S: Likewise.
*	Remove __need_schedparam and __cpu_set_t_defined.	Zack Weinberg	2017-06-12	1	-121/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bits/sched.h has logic to expose only an impl-namespace variant of struct sched_param (i.e. struct __sched_param), but nothing uses it, and the only header that includes bits/sched.h is sched.h. The __need_schedparam logic can therefore be removed. bits/sched.h also has a great deal of code relating to cpu_set_t objects that was almost the same between the two versions of bits/sched.h in the tree; a little spelunking indicated that this is because some bug fixes got applied to the Linux-specific bits/sched.h but not the generic one. Introduce a new header, bits/cpu-set.h, containing the version of that code with the bugfixes, have sched.h include it directly, and delete all of the code from both versions of bits/sched.h. Also remove the unnecessary name mangling in the definition of struct sched_param -- POSIX specifies a field 'sched_priority', so there is no reason to define it as '__sched_priority' and then paper over that with a macro. (Just in case someone was using the internal name, 'sched_priority' remains a macro defined to expand to itself, and '__sched_priority' now expands to 'sched_priority'.) Finally, as long as I'm touching these files anyway, merge new constants from linux/sched.h into the Linux bits/sched.h. * bits/sched.h: Remove __need_schedparam logic and replace with a normal multiple-include guard. Change field name in struct sched_param from __sched_priority to sched_priority. Delete everything under #ifndef __cpu_set_t_defined. * sysdeps/unix/sysv/linux/bits/sched.h: Likewise. Also sync with kernel sched.h, adding SCHED_ISO and SCHED_DEADLINE constants. * posix/sched.h: Include bits/cpu-set.h as well as bits/sched.h. For compatibility, #define sched_priority to itself, and #define __sched_priority as sched_priority. * posix/bits/cpu-set.h: New file containing, verbatim, the code that was under #ifndef __cpu_set_t_defined in sysdeps/unix/sysv/linux/bits/sched.h. * include/bits/cpu-set.h: New wrapper. * posix/Makefile: Install bits/cpu-set.h.
*	float128: Add strtof128, wcstof128, and related functions.	Paul E. Murphy	2017-06-12	10	-0/+291
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The implementations are contained with sysdeps/ieee754/float128 as they are only built when _Float128 is enabled within libc/m. * include/gmp.h (__mpn_construct_float128): New declaration. * include/stdlib.h: Include bits/floatn.h for _Float128 tests. (__strtof128_l): New declaration. (__strtof128_nan): Likewise. (__wcstof128_nan): Likewise. (__strtof128_internal): Likewise. (____strtof128_l_internal): Likewise. * include/wchar.h: Include bits/floatn.h for _Float128 tests. (__wcstof128_l): New declaration. (__wcstof128_internal): Likewise. * stdlib/Makefile (bug-strtod2): Link libm too. * stdlib/stdlib.h (strtof128): New declaration. (strtof128_l): Likewise. * stdlib/tst-strtod-nan-locale-main.c: Updated to use tst-strtod.h macros to ensure float128 gets tested too. * stdlib/tst-strtod-round-skeleton.c (CHOOSE_f128): New macro. * stdlib/tst-strtod.h: Include bits/floatn.h for _Float128 tests. (IF_FLOAT128): New macro. (GEN_TEST_STRTOD): Update to optionally include _Float128 in the tests. (STRTOD_TEST_FOREACH): Likewise. * sysdeps/ieee754/float128/Makefile: Insert new strtof128 and wcstof128 functions into libc. * sysdeps/ieee754/float128/Versions: Add exports for the above new functions. * sysdeps/ieee754/float128/mpn2float128.c: New file. * sysdeps/ieee754/float128/strtod_nan_float128.h: New file. * sysdeps/ieee754/float128/strtof128.c: New file. * sysdeps/ieee754/float128/strtof128_l.c: New file. * sysdeps/ieee754/float128/strtof128_nan.c: New file. * sysdeps/ieee754/float128/wcstof128.c: New file. * sysdeps/ieee754/float128/wcstof128_l.c: New file. * sysdeps/ieee754/float128/wcstof128_nan.c: New fike. * wcsmbs/Makefile: (CFLAGS-wcstof128.c): Append strtox-CFLAGS. (CFLAGS-wcstof128_l): Likewise. * wcsmbs/wchar.h: Include bits/floatn.h for _Float128 tests. (wcstof128): New declaration. (wcstof128_l): Likewise.
*	x86-64: Implement strcpy family IFUNC selectors in C	H.J. Lu	2017-06-12	14	-131/+258
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement strcpy family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcpy family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcpy-sse2 and stpcpy-sse2. * sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file. * sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/stpcpy.c: Likewise. * sysdeps/x86_64/multiarch/stpncpy.c: Likewise. * sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.c: Likewise. * sysdeps/x86_64/multiarch/strncpy.c: Likewise. * sysdeps/x86_64/multiarch/stpcpy.S: Removed. * sysdeps/x86_64/multiarch/stpncpy.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strncpy.S: Likewise. * sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New. (libc_hidden_def): Always defined as empty. * sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def): Always Defined as empty.
*	Replace all internal uses of __bzero with memset. This removes the need	Wilco Dijkstra	2017-06-12	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to redirect it to a builtin and means memset is inlined whenever possible, including with -Os. * sunrpc/bindrsvprt.c (bindresvport): Change __bzero to memset. * sunrpc/clnt_gen.c (clnt_create): Likewise. * sunrpc/des_impl.c (_des_crypt): Likewise. * sunrpc/key_call.c (key_gendes): Likewise. * sunrpc/pmap_rmt.c (clnt_broadcast): Likewise. * sunrpc/svc_simple.c (universal): Likewise. * sunrpc/svc_tcp.c (svctcp_create): Likewise. * sunrpc/svc_udp.c (svcudp_bufcreate): Likewise. * sysdeps/arm/aeabi_memclr.c (__aeabi_memclr): Likewise.
*	powerpc: add sysconf support for cache geometries	Paul Clarke	2017-06-09	3	-0/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is currently no "cross-platform" (x86 and POWER) support for determining the cacheline size. This patch adds support to sysconf() to correctly report cacheline sizes based on the information in the auxilliary vector. Thus, using sysconf() is a cross-platform (x86 and POWER) solution for determining cacheline sizes. Support is added (on powerpc) for: _SC_LEVEL1_ICACHE_SIZE _SC_LEVEL1_ICACHE_ASSOC _SC_LEVEL1_ICACHE_LINESIZE _SC_LEVEL1_DCACHE_SIZE _SC_LEVEL1_DCACHE_ASSOC _SC_LEVEL1_DCACHE_LINESIZE _SC_LEVEL2_CACHE_SIZE _SC_LEVEL2_CACHE_ASSOC _SC_LEVEL2_CACHE_LINESIZE _SC_LEVEL3_CACHE_SIZE _SC_LEVEL3_CACHE_ASSOC _SC_LEVEL3_CACHE_LINESIZE * sysdeps/unix/sysv/linux/powerpc/sysconf.c: New file. Add powerpc-specific overrides for L1, L2, L3 CACHE_SIZEs, CACHE_ASSOCs, and CACHE_LINESIZEs, retrieving from auxv. * sysdeps/unix/sysv/linux/powerpc/test-powerpc-linux-sysconf.c: New file. Invoke newly supported sysconf values for powerpc, and report results. If none are supported, report so. * sysdeps/unix/sysv/linux/powerpc/Makefile (tests): Add new test, tst-sysconf.
*	Fix waitid namespace (bug 21561).	Joseph Myers	2017-06-09	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In sys/wait.h, waitid and associated constants and types are UX-shaded in XPG4.2 (so not in XPG4), and XSI-shaded in POSIX before 2008, so should be appropriately conditional in the headers. This patch fixes the conditionals accordingly. (WCONTINUED is actually still XSI-shaded in POSIX.1:2008, but W* is also reserved there without XSI-shading, so nothing special needs to be done about the conditionals on WCONTINUED to conform to POSIX.1:2008 namespace rules.) Tested for x86_64. [BZ #21561] * posix/sys/wait.h (idtype_t): Change [__USE_XOPEN] condition to [__USE_XOPEN_EXTENDED]. (id_t): Likewise. (include of <bits/types/siginfo_t.h): Likewise. (waitid): Likewise. * sysdeps/unix/sysv/linux/bits/waitflags.h (WSTOPPED): Condition on [__USE_XOPEN_EXTENDED \|\| __USE_XOPEN2K8]. (WEXITED): Likewise. (WCONTINUED): Likewise. (WNOWAIT): Likewise. * conform/Makefile (test-xfail-XPG4/stdlib.h/conform): Remove. (test-xfail-XPG4/sys/wait.h/conform): Likewise. (test-xfail-POSIX/sys/wait.h/conform): Likewise.
*	Update nios2, sparc32 localplt.data files for recent GCC change.	Joseph Myers	2017-06-09	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A recent GCC change to expand floating-point classification built-in functions inline using integer rather than floating-point arithmetic in some cases resulted in localplt test failures for nios2 and sparc32 <https://sourceware.org/ml/libc-testresults/2017-q2/msg00320.html>. This patch updates the localplt.data files in question to mark the relevant symbols as optional / add a new optional symbol. (The GCC patch has been reverted because of other problems it caused, but one can assume it will be applied again, without changes that would affect the PLT entries generated, once those issues have been resolved.) Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/nios2/localplt.data (__gtdf2): Mark libc.so PLT entry optional. (__gtsf2): Likewise. (__unorddf2): Likewise. (__unordsf2): Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/localplt.data (_Q_fgt): New optional libc.so PLT entry.
*	x86-64: Correct comments in ifunc-impl-list.c	H.J. Lu	2017-06-09	1	-6/+6
\| \| \| \|	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Correct comments.
*	x86-64: Optimize strrchr/wcsrchr with AVX2	H.J. Lu	2017-06-09	8	-0/+368
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector instructions. It is as fast as SSE2 version for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strrchr_avx2, __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2. * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strrchr.c: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.
*	x86-64: Optimize memrchr with AVX2	H.J. Lu	2017-06-09	5	-0/+424
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize memrchr with AVX2 to search 32 bytes with a single vector compare instruction. It is as fast as SSE2 memrchr for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 memrchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memrchr-sse2 and memrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memrchr_avx2 and __memrchr_sse2. * sysdeps/x86_64/multiarch/memrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/memrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memrchr.c: Likewise.
*	x86-64: Optimize strchr/strchrnul/wcschr with AVX2	H.J. Lu	2017-06-09	12	-58/+492
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector instructions. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strchr-sse2, strchrnul-sse2, strchr-avx2, strchrnul-avx2, wcschr-sse2 and wcschr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strchr_avx2, __strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and __wcschr_sse2. * sysdeps/x86_64/multiarch/strchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchr.c: Likewise. * sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul.c: Likewise. * sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr.c: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Removed.
*	x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2	H.J. Lu	2017-06-09	13	-1/+621
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 to check 32 bytes with a single vector compare instruction. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strlen-sse2, strnlen-sse2, strlen-avx2, strnlen-avx2, wcslen-sse2, wcslen-avx2 and wcsnlen-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strlen_avx2, __strlen_sse2, __strnlen_avx2, __strnlen_sse2, __wcslen_avx2, __wcslen_sse2 and __wcsnlen_avx2. * sysdeps/x86_64/multiarch/strlen-avx2.S: New file. * sysdeps/x86_64/multiarch/strlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcslen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c (OPTIMIZE (avx2)): New. (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.
*	x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2	H.J. Lu	2017-06-09	13	-25/+620
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSE2 memchr is extended to support wmemchr. AVX2 memchr/rawmemchr/wmemchr are added to search 32 bytes with a single vector compare instruction. AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr for small sizes and up to 1.5X faster for larger sizes on Haswell and Skylake. Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/memchr.S (MEMCHR): New. Depending on if USE_AS_WMEMCHR is defined. (PCMPEQ): Likewise. (memchr): Renamed to ... (MEMCHR): This. Support wmemchr if USE_AS_WMEMCHR is defined. Replace pcmpeqb with PCMPEQ. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2, wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c. * sysdeps/x86_64/multiarch/ifunc-avx2.h: New file. * sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memchr.c: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr.c: Likewise. * sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2, __rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and __wmemchr_sse2.
*	aarch64: Fix undefined behavior in _dl_procinfo	Siddhesh Poyarekar	2017-06-09	1	-3/+3
\| \| \| \| \| \| \| \| \|	1 << 31 is undefined, so replace it with a cleaner check. Also remove magic numbers in comments. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h: Remove mention of magic numbers in comments. (_dl_procinfo): Fix undefined behavior
*	ld.so: Consolidate 2 strtouls into _dl_strtoul [BZ #21528]	H.J. Lu	2017-06-08	2	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are 2 minimal strtoul implementations in ld.so: 1. __strtoul_internal in elf/dl-minimal.c. 2. tunables_strtoul in elf/dl-tunables.c. This patch adds _dl_strtoul to replace them. Tested builds with and without --enable-tunables. [BZ #21528] * elf/dl-minimal.c (__strtoul_internal): Removed. (strtoul): Likewise. * elf/dl-misc.c (_dl_strtoul): New function. * elf/dl-tunables.c (tunables_strtoul): Removed. (tunable_initialize): Replace tunables_strtoul with _dl_strtoul. * elf/rtld.c (process_envvars): Likewise. * sysdeps/unix/sysv/linux/dl-librecon.h (_dl_osversion_init): Likewise. * sysdeps/generic/ldsodefs.h (_dl_strtoul): New prototype.
*	Remove __need macros from stdio.h and wchar.h.	Zack Weinberg	2017-06-08	2	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	wint_t is a little finicky because it might be defined by stddef.h, which belongs to the compiler. In addition to the _types_, a bunch of other declarations shared between wctype.h and wchar.h are factored out to their own header. * libio/bits/types/FILE.h, libio/bits/types/__FILE.h * wcsmbs/bits/types/mbstate_t.h, wcsmbs/bits/types/__mbstate_t.h * wcsmbs/bits/types/wint_t.h: New single-type definition files. * wctype/bits/wctype-wchar.h: New file holding declarations shared between wctype.h and wchar.h. * libio/Makefile, wcsmbs/Makefile, wctype/Makefile: Install them. * include/bits/types/FILE.h, include/bits/types/__FILE.h * include/bits/types/mbstate_t.h, include/bits/types/__mbstate_t.h * include/bits/types/wint_t.h, include/bits/wcsmbs-wchar.h: New wrappers. * include/stdio.h, include/wchar.h, include/wctype.h: No need to handle __need macros. * grp/grp.h, gshadow/gshadow.h, hurd/hurd.h, iconv/gconv.h * libio/stdio.h, mach/mach.h, misc/mntent.h, pwd/pwd.h * shadow/shadow.h, stdio-common/printf.h, wcsmbs/uchar.h * wcsmbs/wchar.h, wctype/wctype.h * sysdeps/generic/_G_config.h, sysdeps/unix/sysv/linux/_G_config.h Use the new files instead of __need macros.
*	x86-64: Rename wmemset.h to ifunc-wmemset.h	H.J. Lu	2017-06-07	4	-4/+4
\| \| \| \| \| \| \| \| \| \|	No code changes. * sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h instead of wmemset.h. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/multiarch/wmemset.h: Renamed to ... * sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.
*	float128: Add strfromf128	Gabriel F. T. Gomes	2017-06-07	3	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add strfromf128 to stdlib when _Float128 support is enabled. * stdio-common/printf-parsemb.c (__parse_one_specmb): Initialize spec->info.is_binary128 to zero. * stdio-common/printf.h (printf_info): Add new member is_binary128 to indicate that the number being converted to string is compatible with the IEC 60559 binary128 format. * stdio-common/printf_fp.c (__printf_fp_l): Add code to deal with _Float128 numbers. * stdio-common/printf_fphex.c: Include ieee754_float128.h and ldbl-128/printf_fphex_macros.h (__printf_fphex): Add code to deal with _Float128 numbers. * stdio-common/printf_size.c (__printf_size): Likewise. * stdio-common/vfprintf.c (process_arg): Initialize member info.is_binary128 to zero. * stdlib/fpioconst.h (FLT128_MAX_10_EXP_LOG): New macro. * stdlib/stdlib.h: Include bits/floatn.h for _Float128 support. (strfromf128): New declaration. * stdlib/strfrom-skeleton.c (STRFROM): Set member info.is_binary128 to one. * sysdeps/ieee754/float128/Makefile: Add strfromf128. * sysdeps/ieee754/float128/Versions: Likewise. * sysdeps/ieee754/float128/strfromf128.c: New file.
*	Refactor PRINT_FPHEX_LONG_DOUBLE into a reusable macro	Gabriel F. T. Gomes	2017-06-07	2	-85/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch refactors the macro PRINT_FPHEX_LONG_DOUBLE from the file sysdeps/ieee754/ldbl-128/printf_fphex.c into a function-like macro to enable its use for both long double and _Float128, when they are ABI-distinct. * sysdeps/ieee754/ldbl-128/printf_fphex.c: Include ldbl-128/printf_fphex_macros.h for the definition of PRINT_FPHEX. (PRINT_FPHEX_LONG_DOUBLE): Define based on PRINT_FPHEX. * sysdeps/ieee754/ldbl-128/printf_fphex_macros.h (PRINT_FPHEX): New function-like macro that can be used for long double, as well as for _Float128
*	float128: Add conversion from float128 to mpn	Gabriel F. T. Gomes	2017-06-07	4	-1/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reuse the code for __mpn_extract_long_double to implement __mpn_extract_float128. * include/gmp.h: Include bits/floatn.h (__mpn_extract_float128): Declare when __HAVE_DISTINCT_FLOAT128 is 1. * stdlib/gmp-impl.h: Also check if alloca is not defined before including stack-alloc.h. It could have been defined by other header which not necessarily defines HAVE_ALLOCA. * sysdeps/ieee754/float128/Makefile: New file. * sysdeps/ieee754/float128/float1282mpn.c: New file. * sysdeps/ieee754/float128/float128_private.h: Include gmp.h before redefining __mpn_extract_long_double to __mpn_extract_float128, then redefine __mpn_extract_long_double to __mpn_extract_float128. * sysdeps/ieee754/ldbl-128/ldbl2mpn.c: Replace long double with _Float128 to allow float128_private.h overrides.
*	x86-64: Fold ifunc-sse4_1.h into wcsnlen.c	H.J. Lu	2017-06-07	2	-35/+15
\| \| \| \| \| \| \| \| \| \| \| \|	Since ifunc-sse4_1.h is included only by wcsnlen.c, we can fold it into wcsnlen.c. No code changes in wcsnlen.o. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> * sysdeps/x86_64/multiarch/ifunc-sse4_1.h: Removed and folded into ... * sysdeps/x86_64/multiarch/wcsnlen.c: Here. Don't include ifunc-sse4_1.h.
*	Remove check for NULL buffer passed to `ptsname_r'	Arjun Shankar	2017-06-07	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	`ptsname_r' is declared in stdlib.h to only accept a `nonnull' second argument and therefore GCC may choose to make optimizations based on the assumption that this argument is NULL. This means that potentially, GCC can optimize away the NULL check at some point in the future. Since this is a programming interface, we might as well remove the NULL check ourselves. This also warrants a change to the `ptsname_r' manual page that must be submitted to the corresponding mailing list. In addition, remove the NULL buffer test in login/tst-ptsname.c.
*	Use test-driver in sysdeps/unix/sysv/linux/tst-clone2.c	Arjun Shankar	2017-06-07	1	-4/+3
\|
*	aarch64: Add hwcap string routines	Siddhesh Poyarekar	2017-06-07	2	-8/+65
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for routines in dl-procinfo.h to show string versions of HWCAP entries when a program is invoked with the LD_SHOW_AUXV environment variable set and also to aid in path resolution for ldconfig. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c (_dl_aarch64_cap_flags): New array. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (_dl_hwcap_string, _dl_string_hwcap, _dl_procinfo): Implement functions.
*	Make LD_HWCAP_MASK usable for static binaries	Siddhesh Poyarekar	2017-06-07	2	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LD_HWCAP_MASK environment variable was ignored in static binaries, which is inconsistent with the behaviour of dynamically linked binaries. This seems to have been because of the inability of ld_hwcap_mask being read early enough to influence anything but now that it is in tunables, the mask is usable in static binaries as well. This feature is important for aarch64, which relies on HWCAP_CPUID being masked out to disable multiarch. A sanity test on x86_64 shows that there are no failures. Likewise for aarch64. * elf/dl-hwcaps.h [HAVE_TUNABLES]: Always read hwcap_mask. * sysdeps/sparc/sparc32/dl-machine.h [HAVE_TUNABLES]: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Always set up hwcap and hwcap_mask.
*	aarch64: Allow overriding HWCAP_CPUID feature check using HWCAP_MASK	Siddhesh Poyarekar	2017-06-07	2	-4/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that LD_HWCAP_MASK (or glibc.tune.hwcap_mask) is read early enough to influence cpu feature check in aarch64, use it to influence multiarch selection. Setting LD_HWCAP_MASK such that it clears HWCAP_CPUID will now disable multiarch for the binary. HWCAP_CPUID is also now set in HWCAP_IMPORTANT so that it is set by default. With this patch, this feature is only usable with dyanmically linked binaries because LD_HWCAP_MASK is not read for static binaries. A future patch fixes that. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features): Use glibc.tune.hwcap_mask. * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h: New file.
*	tunables: Use glibc.tune.hwcap_mask tunable instead of _dl_hwcap_mask	Siddhesh Poyarekar	2017-06-07	3	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drop _dl_hwcap_mask when building with tunables. This completes the transition of hwcap_mask reading from _dl_hwcap_mask to tunables. * elf/dl-hwcaps.h: New file. * elf/dl-hwcaps.c: Include it. (_dl_important_hwcaps)[HAVE_TUNABLES]: Read and update glibc.tune.hwcap_mask. * elf/dl-cache.c: Include dl-hwcaps.h. (_dl_load_cache_lookup)[HAVE_TUNABLES]: Read glibc.tune.hwcap_mask. * sysdeps/sparc/sparc32/dl-machine.h: Likewise. * elf/dl-support.c (_dl_hwcap2)[HAVE_TUNABLES]: Drop _dl_hwcap_mask. * elf/rtld.c (rtld_global_ro)[HAVE_TUNABLES]: Drop _dl_hwcap_mask. (process_envvars)[HAVE_TUNABLES]: Likewise. * sysdeps/generic/ldsodefs.h (rtld_global_ro)[HAVE_TUNABLES]: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't initialize dl_hwcap_mask when tunables are enabled.
*	Add include guards to dl-procinfo.h	Siddhesh Poyarekar	2017-06-07	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dl-procinfo.h for linux/s390 and linux/i386 don't have include guards, which causes them to fail since addition of LD_HWCAP_MASK to tunables. Add _DL_I386_PROCINFO_H guard to avoid redefining _dl_procinfo on multiple includes and also allow the subsequent include of another dl-procinfo.h to work. Verified with a build test on i686. * sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Add include guard. * sysdeps/unix/sysv/linux/s390/dl-procinfo.h: Likewise.
*	x86-64: Move wcsnlen.S to multiarch/wcsnlen-sse4_1.S	H.J. Lu	2017-06-06	7	-8/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since wcsnlen.S uses pminud which is the part of SSE4.1, move wcsnlen.S to multiarch/wcsnlen-sse4_1.S. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wcsnlen-sse4_1 and wcsnlen-c. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __wcsnlen_sse4_1 and __wcsnlen_sse2. * sysdeps/x86_64/multiarch/ifunc-sse4_1.h: New file. * sysdeps/x86_64/multiarch/wcsnlen-c.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c: Likewise. * sysdeps/x86_64/wcsnlen.S: Removed.
*	S390: Use generic spinlock code.	Stefan Liebler	2017-06-06	4	-115/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the s390 specific implementation of spinlock code and is now using the generic one. ChangeLog: * sysdeps/s390/nptl/pthread_spin_init.c: Delete File. * sysdeps/s390/nptl/pthread_spin_lock.c: Likewise. * sysdeps/s390/nptl/pthread_spin_trylock.c: Likewise. * sysdeps/s390/nptl/pthread_spin_unlock.c: Likewise.
*	Optimize generic spinlock code and use C11 like atomic macros.	Stefan Liebler	2017-06-06	29	-167/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch optimizes the generic spinlock code. The type pthread_spinlock_t is a typedef to volatile int on all archs. Passing a volatile pointer to the atomic macros which are not mapped to the C11 atomic builtins can lead to extra stores and loads to stack if such a macro creates a temporary variable by using "__typeof ((mem)) tmp;". Thus, those macros which are used by spinlock code - atomic_exchange_acquire, atomic_load_relaxed, atomic_compare_exchange_weak - have to be adjusted. According to the comment from Szabolcs Nagy, the type of a cast expression is unqualified (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_423.htm): __typeof ((__typeof ((mem)) (mem)) tmp; Thus from spinlock perspective the variable tmp is of type int instead of type volatile int. This patch adjusts those macros in include/atomic.h. With this construct GCC >= 5 omits the extra stores and loads. The atomic macros are replaced by the C11 like atomic macros and thus the code is aligned to it. The pthread_spin_unlock implementation is now using release memory order instead of sequentially consistent memory order. The issue with passed volatile int pointers applies to the C11 like atomic macros as well as the ones used before. I've added a glibc_likely hint to the first atomic exchange in pthread_spin_lock in order to return immediately to the caller if the lock is free. Without the hint, there is an additional jump if the lock is free. I've added the atomic_spin_nop macro within the loop of plain reads. The plain reads are also realized by C11 like atomic_load_relaxed macro. The new define ATOMIC_EXCHANGE_USES_CAS determines if the first try to acquire the spinlock in pthread_spin_lock or pthread_spin_trylock is an exchange or a CAS. This is defined in atomic-machine.h for all architectures. The define SPIN_LOCK_READS_BETWEEN_CMPXCHG is now removed. There is no technical reason for throwing in a CAS every now and then, and so far we have no evidence that it can improve performance. If that would be the case, we have to adjust other spin-waiting loops elsewhere, too! Using a CAS loop without plain reads is not a good idea on many targets and wasn't used by one. Thus there is now no option to do so. Architectures are now using the generic spinlock automatically if they do not provide an own implementation. Thus the pthread_spin_lock.c files in sysdeps folder are deleted. ChangeLog: NEWS: Mention new spinlock implementation. * include/atomic.h: (__atomic_val_bysize): Cast type to omit volatile qualifier. (atomic_exchange_acq): Likewise. (atomic_load_relaxed): Likewise. (ATOMIC_EXCHANGE_USES_CAS): Check definition. * nptl/pthread_spin_init.c (pthread_spin_init): Use atomic_store_relaxed. * nptl/pthread_spin_lock.c (pthread_spin_lock): Use C11-like atomic macros. * nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. * nptl/pthread_spin_unlock.c (pthread_spin_unlock): Use atomic_store_release. * sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File. * sysdeps/arm/nptl/pthread_spin_lock.c: Likewise. * sysdeps/hppa/nptl/pthread_spin_lock.c: Likewise. * sysdeps/m68k/nptl/pthread_spin_lock.c: Likewise. * sysdeps/microblaze/nptl/pthread_spin_lock.c: Likewise. * sysdeps/mips/nptl/pthread_spin_lock.c: Likewise. * sysdeps/nios2/nptl/pthread_spin_lock.c: Likewise. * sysdeps/aarch64/atomic-machine.h (ATOMIC_EXCHANGE_USES_CAS): Define. * sysdeps/alpha/atomic-machine.h: Likewise. * sysdeps/arm/atomic-machine.h: Likewise. * sysdeps/i386/atomic-machine.h: Likewise. * sysdeps/ia64/atomic-machine.h: Likewise. * sysdeps/m68k/coldfire/atomic-machine.h: Likewise. * sysdeps/m68k/m680x0/m68020/atomic-machine.h: Likewise. * sysdeps/microblaze/atomic-machine.h: Likewise. * sysdeps/mips/atomic-machine.h: Likewise. * sysdeps/powerpc/powerpc32/atomic-machine.h: Likewise. * sysdeps/powerpc/powerpc64/atomic-machine.h: Likewise. * sysdeps/s390/atomic-machine.h: Likewise. * sysdeps/sparc/sparc32/atomic-machine.h: Likewise. * sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: Likewise. * sysdeps/sparc/sparc64/atomic-machine.h: Likewise. * sysdeps/tile/tilegx/atomic-machine.h: Likewise. * sysdeps/tile/tilepro/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/nios2/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/sh/atomic-machine.h: Likewise. * sysdeps/x86_64/atomic-machine.h: Likewise.
*	x86: Don't use dl_x86_cpu_features in cacheinfo.c	H.J. Lu	2017-06-05	1	-15/+22
\| \| \| \| \| \| \| \| \| \| \| \|	Since cpu_features is available, use it instead of dl_x86_cpu_features. * sysdeps/x86/cacheinfo.c (intel_check_word): Accept cpu_features and use it instead of dl_x86_cpu_features. (handle_intel): Replace maxidx with cpu_features. Pass cpu_features to intel_check_word. (__cache_sysconf): Pass cpu_features to handle_intel. (init_cacheinfo): Likewise. Use cpu_features instead of dl_x86_cpu_features.
*	x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE	H.J. Lu	2017-06-05	7	-3/+466
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize x86-64 memcmp/wmemcmp with AVX2. It uses vector compare as much as possible. It is as fast as SSE4 memcmp for size <= 16 bytes and up to 2X faster for size > 16 bytes on Haswell and Skylake. Select AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. Key features: 1. For size from 2 to 7 bytes, load as big endian with movbe and bswap to avoid branches. 2. Use overlapping compare to avoid branch. 3. Use vector compare when size >= 4 bytes for memcmp or size >= 8 bytes for wmemcmp. 4. If size is 8 * VEC_SIZE or less, unroll the loop. 5. Compare 4 * VEC_SIZE at a time with the aligned first memory area. 6. Use 2 vector compares when size is 2 * VEC_SIZE or less. 7. Use 4 vector compares when size is 4 * VEC_SIZE or less. 8. Use 8 vector compares when size is 8 * VEC_SIZE or less. * sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-avx2 and wmemcmp-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2. * sysdeps/x86_64/multiarch/memcmp-avx2.S: New file. * sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred. * sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred.
*	x86-64: Optimize wmemset with SSE2/AVX2/AVX512	H.J. Lu	2017-06-05	13	-9/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
*	x86: Add macros to implement ifunce selection in C	H.J. Lu	2017-06-05	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These macros are used to implement ifunc selection in C. To implement an ifunc function, foo, which returns the address of __foo_sse2 or __foo_avx2: __foo_avx2: #define foo __redirect_foo #define __foo __redirect___foo #include <foo.h> #undef foo #undef __foo #define SYMBOL_NAME foo #include <init-arch.h> extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden; static inline void * foo_selector (void) { if (use AVX2) return OPTIMIZE (avx2); return OPTIMIZE (sse2); } libc_ifunc_redirected (__redirect_foo, foo, foo_selector ()); * sysdeps/x86/init-arch.h (PASTER1): New. (EVALUATOR1): Likewise. (PASTER2): Likewise. (EVALUATOR2): Likewise. (REDIRECT_NAME): Likewise. (OPTIMIZE): Likewise. (IFUNC_SELECTOR): Likewise.
*	x86-64: Update strlen.S to support wcslen/wcsnlen	H.J. Lu	2017-06-05	2	-21/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The difference between strlen and wcslen is byte vs int. We can replace pminub and pcmpeqb with pminud and pcmpeqd to turn strlen into wcslen. * sysdeps/x86_64/strlen.S (PMINU): New. (PCMPEQ): Likewise. (SHIFT_RETURN): Likewise. (FIND_ZERO): Replace pcmpeqb with PCMPEQ. (strlen): Add SHIFT_RETURN before ret. Replace pcmpeqb and pminub with PCMPEQ and PMINU. * sysdeps/x86_64/wcsnlen.S: New file.
*	x86_64: Remove redundant REX bytes from memrchr.S	H.J. Lu	2017-06-05	1	-19/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By x86-64 specification, 32-bit destination registers are zero-extended to 64 bits. There is no need to use 64-bit registers when only the lower 32 bits are non-zero. Also 2 instructions in: mov %rdi, %rcx and $15, %rcx jz L(length_less16_offset0) mov %rdi, %rcx <<< redundant and $15, %rcx <<< redundant are redundant. * sysdeps/x86_64/memrchr.S (__memrchr): Use 32-bit registers for the lower 32 bits. Remove redundant instructions.