about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Use hidden __GI__dl_argv in rtld startup codeSzabolcs Nagy2017-06-211-2/+2
| | | | | We rely on the symbol being locally defined so using extern symbol is not correct and the linker may complain about the relocations.
* getaddrinfo: Avoid stack copy of IPv6 addressFlorian Weimer2017-06-211-40/+5
|
* powerpc: Optimize memchr for power8Rajalakshmi Srinivasaraghavan2017-06-215-1/+371
| | | | | Vectorized loops are used for sizes greater than 32B to improve performance over power7 optimiztion.
* powerpc: Add optimized version of [l]lrintfRajalakshmi Srinivasaraghavan2017-06-215-36/+68
| | | | | This patch makes use of optimized double version of llrint for single precision as both the versions return [long] long type.
* Factor out shared definitions from bits/signum.h.Zack Weinberg2017-06-209-343/+273
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many of the things defined by bits/signum.h are invariant across all supported operating systems. This patch factors out all of them to a new header bits/signum-generic.h, which each bits/signum.h will include and then override whichever things need adjustment. Normally that will mean, at most, adding or changing a few signal numbers. A user-visible side effect is that the obsolete signal constant SIGUNUSED (which is an alias for SIGSYS on all platforms that define it) is no longer exposed by any version of bits/signum.h. A side effect only relevant to glibc hackers is that _NSIG is now defined in terms of __SIGRTMAX, instead of the other way around. This is because __SIGRTMAX varies from platform to platform, but _NSIG==__SIGRTMAX+1 is true universally. If your platform doesn't support realtime signals, leave __SIGRTMAX equal to __SIGRTMIN. I also added a Linux-specific test to make sure that our signal constants match the ones in <asm/signal.h>, since we can't use that header (it's not even vaguely namespace-clean). * bits/signum-generic.h: Renamed from bits/signum.h. Add proper multiple include guard and misuse check. Define __SIGRTMIN = __SIGRTMAX = 32, and define _NSIG = __SIGRTMAX+1. Move definition of SIGIO to "archaic names for compatibility" section. * bits/signum.h: New file which just includes bits/signum-generic.h. * sysdeps/unix/bsd/bits/signum.h * sysdeps/unix/sysv/linux/bits/signum.h * sysdeps/unix/sysv/linux/alpha/bits/signum.h * sysdeps/unix/sysv/linux/hppa/bits/signum.h * sysdeps/unix/sysv/linux/mips/bits/signum.h * sysdeps/unix/sysv/linux/sparc/bits/signum.h Just include <bits/signum-generic.h> and then add or adjust signal constants. Do not define SIGUNUSED, SIGRTMIN, or SIGRTMAX. * signal/Makefile: Install bits/signum-generic.h. * signal/signal.h: Define SIGRTMIN and SIGRTMAX here. * sysdeps/generic/siglist.h: SIGSYS and SIGWINCH are universal. Prefer SIGPOLL to SIGIO. Simplify #ifdeffage. * sysdeps/unix/sysv/linux/tst-signal-numbers.sh: New test. * sysdeps/unix/sysv/linux/Makefile: Run it.
* Use locale_t, not __locale_t, throughout glibcZack Weinberg2017-06-2013-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <locale.h> is specified to define locale_t in POSIX.1-2008, and so are all of the headers that define functions that take locale_t arguments. Under _GNU_SOURCE, the additional headers that define such functions have also always defined locale_t. Therefore, there is no need to use __locale_t in public function prototypes, nor in any internal code. * ctype/ctype-c99_l.c, ctype/ctype.h, ctype/ctype_l.c * include/monetary.h, include/stdlib.h, include/time.h * include/wchar.h, locale/duplocale.c, locale/freelocale.c * locale/global-locale.c, locale/langinfo.h, locale/locale.h * locale/localeinfo.h, locale/newlocale.c * locale/nl_langinfo_l.c, locale/uselocale.c * localedata/bug-usesetlocale.c, localedata/tst-xlocale2.c * stdio-common/vfscanf.c, stdlib/monetary.h, stdlib/stdlib.h * stdlib/strfmon_l.c, stdlib/strtod_l.c, stdlib/strtof_l.c * stdlib/strtol.c, stdlib/strtol_l.c, stdlib/strtold_l.c * stdlib/strtoll_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c * string/strcasecmp.c, string/strcoll_l.c, string/string.h * string/strings.h, string/strncase.c, string/strxfrm_l.c * sysdeps/ieee754/float128/strtof128_l.c * sysdeps/ieee754/float128/wcstof128.c * sysdeps/ieee754/float128/wcstof128_l.c * sysdeps/ieee754/ldbl-128ibm/strtold_l.c * sysdeps/ieee754/ldbl-64-128/strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-compat.c * sysdeps/ieee754/ldbl-opt/nldbl-strfmon_l.c * sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-wcstold_l.c * sysdeps/powerpc/powerpc32/power7/strcasecmp.S * sysdeps/powerpc/powerpc64/power7/strcasecmp.S * sysdeps/x86_64/strcasecmp_l-nonascii.c * sysdeps/x86_64/strncase_l-nonascii.c, time/strftime_l.c * time/strptime_l.c, time/time.h, wcsmbs/mbsrtowcs_l.c * wcsmbs/wchar.h, wcsmbs/wcscasecmp.c, wcsmbs/wcsncase.c * wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c * wcsmbs/wcstof_l.c, wcsmbs/wcstol_l.c, wcsmbs/wcstold.c * wcsmbs/wcstold_l.c, wcsmbs/wcstoll_l.c, wcsmbs/wcstoul_l.c * wcsmbs/wcstoull_l.c, wctype/iswctype_l.c * wctype/towctrans_l.c, wctype/wcfuncs_l.c * wctype/wctrans_l.c, wctype/wctype.h, wctype/wctype_l.c: Change all uses of __locale_t to locale_t.
* Rename xlocale.h to bits/types/__locale_t.h.Zack Weinberg2017-06-202-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlocale.h is already a single-type micro-header, defining struct __locale_struct and the typedefs __locale_t and locale_t. This patch brings it into the bits/types/ scheme: there are now bits/types/__locale_t.h which defines only __locale_struct and __locale_t, and bits/types/locale_t.h which defines locale_t as well as the other two. None of *our* headers need __locale_t.h, but it appears to me that libstdc++ could make use of it. There are a lot of external uses of xlocale.h, but all the uses I checked had an autoconf test or equivalent for its existence. It has never been available from other C libraries, and it has always contained a comment reading "This file is not standardized, don't rely on it, it can go away without warning" so I think dropping it is pretty safe. I also took the opportunity to clean up comments in various public header files that still talk about the *_l interfaces as though they were completely nonstandard. There are a few of them, notably the strtoX_l and wcstoX_l families, that haven't been standardized, but the bulk are in POSIX.1-2008. * locale/xlocale.h: Rename to... * locale/bits/types/__locale_t.h: ...here. Adjust commentary. Only define struct __locale_struct and __locale_t, not locale_t. * locale/bits/types/locale_t.h: New file; define locale_t here. * locale/Makefile (headers): Update to match. * include/xlocale.h: Delete wrapper. * include/bits/types/__locale_t.h: New wrapper. * include/bits/types/locale_t.h: New wrapper. * ctype/ctype.h, include/printf.h, include/time.h * locale/langinfo.h, locale/locale.h, stdlib/monetary.h * stdlib/stdlib.h, string/string.h, string/strings.h, time/time.h * wcsmbs/wchar.h, wctype/wctype.h: Use bits/types/locale_t.h. Correct outdated comments regarding the standardization status of the functions that take locale_t arguments. * stdlib/strtod_l.c, stdlib/strtof_l.c, stdlib/strtol_l.c * stdlib/strtold_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c * sysdeps/ieee754/ldbl-128ibm/strtold_l.c * sysdeps/ieee754/ldbl-64-128/strtold_l.c * wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c * wcsmbs/wcstof_l.c, wcsmbs/wcstold.c, wcsmbs/wcstold_l.c: Don't include xlocale.h. If necessary, include locale.h instead. * stdlib/strtold_l.c: Unconditionally include wchar.h.
* Consolidate Linux openat implementationAdhemerval Zanella2017-06-204-38/+62
| | | | | | | | | | | | | | | | | | | | This patch consolidates the open Linux syscall implementation on sysdeps/unix/sysv/linux/open{64}.c. The changes are: 1. Remove wordsize-64 openat{64}. 2. For architetures that define __OFF_T_MATCHES_OFF64_T openat64 will be default one with alias to required symbols. Otherwise openat64 will pass the required O_LARGEFILE flag on syscall. Checked on i686-linux-gnu, x86_64-linux-gnu, x86_64-linux-gnux32, arch64-linux-gnu, arm-linux-gnueabihf, and powerpc64le-linux-gnu. * sysdeps/unix/sysv/linux/openat.c (__libc_openat): Build only for !__OFF_T_MATCHES_OFF64_T. * sysdeps/unix/sysv/linux/openat64.c (__libc_openat64): New implementation based on open64. * sysdeps/unix/sysv/linux/wordsize-64/openat.c: Remove file. * sysdeps/unix/sysv/linux/wordsize-64/openat64.c: Likewise.
* Move x86 specific tunables to x86/dl-tunables.listH.J. Lu2017-06-201-0/+34
| | | | | * elf/dl-tunables.list: Move x86 specific tunables to ... * sysdeps/x86/dl-tunables.list: Here. New file.
* conformtest: XFAIL uc_mcontext test for powerpc32 (bug 21635).Joseph Myers2017-06-201-0/+5
| | | | | | | | | | | | | | | This patch XFAILs one test where the powerpc32 ucontext_t has the wrong type of a field, to allow the conform/ tests as a whole to pass once the namespace issues are fixed. Tested with build-many-glibcs.py. [BZ #21635] * sysdeps/unix/sysv/linux/powerpc/powerpc32/Makefile [$(subdir) = conform] (conformtest-xfail-conds): New variable. * conform/data/signal.h-data (uc_mcontext): XFAIL for powerpc32-linux. * conform/data/ucontext.h-data (uc_mcontext): Likewise.
* conformtest: XFAIL uc_sigmask test for ia64 (bug 21634).Joseph Myers2017-06-201-0/+5
| | | | | | | | | | | | | | This patch XFAILs one test where the ia64 ucontext_t has the wrong type of a field, to allow the conform/ tests as a whole to pass once the namespace issues are fixed. Tested with build-many-glibcs.py. [BZ #21634] * sysdeps/unix/sysv/linux/ia64/Makefile [$(subdir) = conform] (conformtest-xfail-conds): New variable. * conform/data/signal.h-data (uc_sigmask): XFAIL for ia64-linux. * conform/data/ucontext.h-data (uc_sigmask): Likewise.
* tunables: Add IFUNC selection and cache sizesH.J. Lu2017-06-205-1/+387
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current IFUNC selection is based on microbenchmarks in glibc. It should give the best performance for most workloads. But other choices may have better performance for a particular workload or on the hardware which wasn't available at the selection was made. The environment variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz, where the feature name is case-sensitive and has to match the ones in cpu-features.h. It can be used by glibc developers to override the IFUNC selection to tune for a new processor or improve performance for a particular workload. It isn't intended for normal end users. NOTE: the IFUNC selection may change over time. Please check all multiarch implementations when experimenting. Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is provided to set threshold to use non temporal store to NUMBER, GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set shared cache size. * elf/dl-tunables.list (tune): Add ifunc, x86_non_temporal_threshold, x86_data_cache_size and x86_shared_cache_size. * manual/tunables.texi: Document glibc.tune.ifunc, glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size and glibc.tune.x86_non_temporal_threshold. * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and get data cache size, shared cache size and non temporal threshold from cpu_features. * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE): New. [HAVE_TUNABLES] Include <unistd.h>. [HAVE_TUNABLES] Include <elf/dl-tunables.h>. [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise. [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set IFUNC selection, data cache size, shared cache size and non temporal threshold. * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size, shared_cache_size and non_temporal_threshold.
* Fix fallout from bits/string.h removal.Zack Weinberg2017-06-202-0/+4
| | | | | | | | | | | | | | | Remove one more string inline that was defined directly in string.h; in the absence of the rest of the inlines, it broke the build. Like other ifunc shims for these functions, x86_64/multiarch/{mem,st}pcpy.c need to define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * string/string.h (__mempcpy_inline): Delete. * sysdeps/x86_64/multiarch/mempcpy.c * sysdeps/x86_64/multiarch/stpcpy.c: Define NO_MEMPCPY_STPCPY_REDIRECT and __NO_STRING_INLINES before including string.h.
* Remove bits/string.h.Zack Weinberg2017-06-2013-2358/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These machine-dependent inline string functions have never been on by default, and even if they were a good idea at the time they were introduced, they haven't really been touched in ten to fifteen years and probably aren't a good idea on current-gen processors. Current thinking is that this class of optimization is best left to the compiler. * bits/string.h, string/bits/string.h * sysdeps/aarch64/bits/string.h * sysdeps/m68k/m680x0/m68020/bits/string.h * sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h * sysdeps/x86/bits/string.h: Delete file. * string/string.h: Don't include bits/string.h. * string/bits/string3.h: Rename to bits/string_fortified.h. No need to undef various symbols that the removed headers might have defined as macros. * string/Makefile (headers): Remove bits/string.h, change bits/string3.h to bits/string_fortified.h. * string/string-inlines.c: Update commentary. Remove definitions of various macros that nothing looks at anymore. Don't directly include bits/string.h. Set _STRING_INLINE_unaligned here, based on compiler-predefined macros. * string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY _is_ defined, provide internal hidden alias __strncat. * include/string.h: Declare internal hidden alias __strncat. Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is not defined. * include/bits/string3.h: Rename to bits/string_fortified.h, update to match above. * sysdeps/i386/string-inlines.c: Define compat symbols for everything formerly defined by sysdeps/x86/bits/string.h. Make existing definitions into compat symbols as well. Remove some no-longer-necessary messing around with macros. * sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/stpcpy.c * sysdeps/s390/multiarch/mempcpy.c No need to define _HAVE_STRING_ARCH_mempcpy. Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * sysdeps/i386/i686/multiarch/strncat-c.c * sysdeps/s390/multiarch/strncat-c.c * sysdeps/x86_64/multiarch/strncat-c.c Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
* Remove pre-GCC-4.9 MIPS code.Joseph Myers2017-06-192-327/+43
| | | | | | | | | | | | | | This patch removes some MIPS code in glibc that was conditional on old GCC versions no longer supported for building glibc. Tested with build-many-glibcs.py. * sysdeps/mips/atomic-machine.h (R10K_BEQZ_INSN): Remove. [__GNUC_PREREQ (4, 8) || __mips16]: Make code unconditional. [!__GNUC_PREREQ (4, 8) && !__mips16]: Remove conditional code. * sysdeps/mips/math-tests.h [_MIPS_SIM != _ABIO32 && !__GNUC_PREREQ (4, 9)]: Remove conditional code.
* S390: Sync ptrace.h with kernel. [BZ #21539]Stefan Liebler2017-06-194-22/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_GETFPREGS and PTRACE_SETFPREGS as these requests does not exist on s390 kernel. But the kernel has support for PTRACE_SINGLEBLOCK, PTRACE_SECCOMP_GET_FILTER, PTRACE_PEEKUSR_AREA, PTRACE_POKEUSR_AREA, PTRACE_GET_LAST_BREAK, PTRACE_ENABLE_TE, PTRACE_DISABLE_TE and PTRACE_TE_ABORT_RAND. Thus those are defined now. The current kernel s390 specific ptrace.h file also defines PTRACE_PEEKTEXT_AREA, PTRACE_PEEKDATA_AREA, PTRACE_POKETEXT_AREA, PTRACE_POKEDATA_AREA, PTRACE_PEEK_SYSTEM_CALL, PTRACE_POKE_SYSTEM_CALL and PTRACE_PROT, but those requests are not supported. Thus those defines are skipped in glibc ptrace.h. There were old includes of ptrace.h in sysdeps/s390/fpu/fesetenv.c. The ptrace feature isn't used there anymore, thus I removed the includes. Before this patch, <glibc>/sysdeps/unix/sysv/linux/s390/sys/ptrace.h uses ptrace-request 12 for PTRACE_GETREGS, but <kernel>/include/uapi/linux/ptrace.h uses 12 for PTRACE_SINGLEBLOCK. The s390 kernel has never had support for PTRACE_GETREGS! Thus glibc ptrace.h is adjusted to match kernel ptrace.h. The new s390 specific test ensures, that PTRACE_SINGLEBLOCK defined in glibc works as expected. If the kernel would interpret it as PTRACE_GETREGS, then the testcase will not make any progress and will time out. ChangeLog: [BZ #21539] * NEWS: Mention s390 ptrace request changes. * sysdeps/unix/sysv/linux/s390/sys/ptrace.h (PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_GETFPREGS, PTRACE_SETFPREGS): Remove enum constant. (PT_GETREGS, PT_SETREGS, PT_GETFPREGS, T_SETFPREGS): Remove defines. (PTRACE_SINGLEBLOCK): New enum constant. (PT_STEPBLOCK): New define. (PTRACE_PEEKUSR_AREA, PTRACE_POKEUSR_AREA, PTRACE_GET_LAST_BREAK, PTRACE_ENABLE_TE, PTRACE_DISABLE_TE, PTRACE_TE_ABORT_RAND): New enum constant and define. * sysdeps/s390/fpu/fesetenv.c: Remove ptrace.h includes. * sysdeps/unix/sysv/linux/s390/tst-ptrace-singleblock.c: New file. * sysdeps/unix/sysv/linux/s390/Makefile: Add test.
* Fix another x86 sys/ucontext.h namespace issue (bug 21457).Joseph Myers2017-06-191-1/+1
| | | | | | | | | | | This patch fixes a namespace issue for one more field in the x86 sys/ucontext.h that I missed in my previous changes. Tested for x86_64. [BZ #21457] * sysdeps/unix/sysv/linux/x86/sys/ucontext.h [__x86_64__] (struct _libc_xmmreg): Use __ctx in defining field.
* Define struct rusage in sys/wait.h when required (bug 21575).Joseph Myers2017-06-194-234/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some older standards (XPG4.2 through POSIX.1:2001, XSI only) require sys/wait.h to include the definition of struct rusage. This is missing in glibc. This patch adds the required definition. struct rusage is moved to a new header bits/types/struct_rusage.h to avoid bringing in the whole of sys/resource.h (although the standards in question do allow the whole of sys/resource.h to be brought in). In the five bits/resource.h headers, the only variation between the definitions of struct rusage is that the sysdeps/unix/sysv/linux version is prepared for x32 (by having anonymous unions with __syscall_slong_t fields) and the others are not. Thus, this version is suitable for use generically (everything other than x32 simply has __syscall_slong_t the same as long int, so there are no API or ABI changes involved, and anonymous unions are already a required language feature for glibc headers elsewhere), and this patch uses it as a base for the single implementation of bits/types/struct_rusage.h. Tested for x86_64, and with build-many-glibcs.py. [BZ #21575] * resource/bits/types/struct_rusage.h: New file. * include/bits/types/struct_rusage.h: Likewise. * bits/resource.h (struct rusage): Include <bits/types/struct_rusage.h> instead of defining here. * sysdeps/unix/sysv/linux/bits/resource.h (struct rusage): Likewise. * sysdeps/unix/sysv/linux/alpha/bits/resource.h (struct rusage): Likewise. * sysdeps/unix/sysv/linux/mips/bits/resource.h (struct rusage): Likewise. * sysdeps/unix/sysv/linux/sparc/bits/resource.h (struct rusage): Likewise. * resource/Makefile (headers): Add bits/types/struct_rusage.h. * posix/sys/wait.h [__USE_XOPEN_EXTENDED && !__USE_XOPEN2K8]: Include <bits/types/struct_rusage.h>
* Fix typo when undefining weak_aliasSiddhesh Poyarekar2017-06-191-1/+1
| | | | | | The macro directive #undef was miswritten as #undefine. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Fix typo.
* S390: Fix build with gcc configured with --enable-default-pie. [BZ #21537]Stefan Liebler2017-06-193-24/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building glibc with gcc configured with --enable-default-pie failed on s390 due to assembler messages: ../sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S:44: Error: junk at end of line, first unrecognized character is `@' HIDDEN_JUMPTARGET was expanded to exit@PLT@GOTOFF. If SHARED is not defined, HIDDEN_JUMPTARGET is defined to JUMPTARGET in sysdeps/s390/s390-32/sysdep.h. There it expanded to exit@PLT in non SHARED case as PIC is defined if gcc is configured with --enable-default-pie. Thus I've changed the "ifdef PIC" to "ifdef SHARED" as we do not want PLTs in the static obj files. I've also changed this in sysdeps/s390/s390-64/sysdep.h. I've also adjusted sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S. If glibc is configured with --disable-hidden-plt, then NO_HIDDEN is defined. In SHARED case HIDDEN_JUMPTARGET would be expanded to exit@PLT@GOTOFF instead of __GI_exit@GOTOFF. Now we jump to: - __GI_exit if SHARED is defined - exit@PLT if SHARED and NO_HIDDEN is defined - exit if both are not defined. On s390 31bit we have to setup GOT pointer in r12 if we use a PLT stub. Therefore I use SYSCALL_PIC_SETUP from sysdep.h and added the missing semicolons. ChangeLog: [BZ #21537] * sysdeps/s390/s390-32/sysdep.h (JUMPTARGET, SYSCALL_PIC_SETUP): Check SHARED instead of PIC. (SYSCALL_PIC_SETUP): Add missing semicolons. * sysdeps/s390/s390-64/sysdep.h (JUMPTARGET, SYSCALL_PIC_SETUP): Check SHARED instead of PIC. * sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S (__makecontext_ret): Adjust code to jump to exit.
* s390: optimize syscall functionChristian Borntraeger2017-06-192-18/+6
| | | | | | | | | | | | | | Since kernel 2.6.0 all Linux version accept the system call number in register 1 for svc 0. There is no need to have special handling that uses EX for system calls < 256. This will simplify and speed up that code. A microbenchmark doing "syscall(__NR_getpid);" in a loops gets faster by ~12%. * sysdeps/unix/sysv/linux/s390/s390-32/syscall.S: Simplify code by always using SVC 0 instead of EX. * sysdeps/unix/sysv/linux/s390/s390-64/syscall.S: Likewise.
* linux: Consolidate sync_file_range implementationAdhemerval Zanella2017-06-152-1/+1
| | | | | | | | | | | | | | | This patch consolidates Linux sync_file_range at default sysdeps/unix/sysv/linux/sync_file_range.c implementation. It also moves the rules flags from generic io/Makefile to Linux one due the fact it is a Linux-only symbol. Checked on i686-linux-gnu and x86_64-linux-gnu. * io/Makefile (CFLAGS-sync_file_range.c): Remove rule. * sysdeps/unix/sysv/linux/Makefile (CFLAGS-sync_file_range.c): New rule. * sysdeps/unix/sysv/linux/wordsize-64/syscalls.list: Remove sync_file_range.
* x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in CH.J. Lu2017-06-1513-115/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement strcspn/strpbrk/strspn IFUNC selectors in C All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcspn/strpbrk/strspn functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcspn-sse2, strpbrk-sse2 and strspn-sse2. * sysdeps/x86_64/strcspn.S (STRPBRK_P): Removed. Check USE_AS_STRPBRK instead of STRPBRK_P. * sysdeps/x86_64/strpbrk.S (USE_AS_STRPBRK): New. * sysdeps/x86_64/multiarch/ifunc-sse4_2.h: New file. * sysdeps/x86_64/multiarch/strcspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.c: Likewise. * sysdeps/x86_64/multiarch/strpbrk-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk.c: Likewise. * sysdeps/x86_64/multiarch/strspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strspn.c: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Removed. * sysdeps/x86_64/multiarch/strpbrk.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk-c.c: Remove "#ifdef SHARED" and "#endif".
* x86-64: Implement wcscpy IFUNC selector in CH.J. Lu2017-06-151-17/+21
| | | | | * sysdeps/x86_64/multiarch/wcscpy.S: Removed. * sysdeps/x86_64/multiarch/wcscpy.c: New file.
* x86-64: Implement strcat family IFUNC selectors in CH.J. Lu2017-06-156-90/+95
| | | | | | | | | | | | | | | | | | | | Implement strcat family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcat family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcat-sse2. * sysdeps/x86_64/multiarch/strcat-sse2.S: New file. * sysdeps/x86_64/multiarch/strcat.c: Likewise. * sysdeps/x86_64/multiarch/strncat.c: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Removed. * sysdeps/x86_64/multiarch/strncat.S: Likewise.
* x86-64: Implement memcmp family IFUNC selectors in CH.J. Lu2017-06-157-113/+126
| | | | | | | | | | | | | | | | | | | | | Implement memcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-sse2. * sysdeps/x86_64/multiarch/ifunc-memcmp.h: New file. * sysdeps/x86_64/multiarch/memcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Removed. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* x86-64: Implement memset family IFUNC selectors in CH.J. Lu2017-06-1512-147/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement memset family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memset functions within libc. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> Erich Elsen <eriche@google.com> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, and memset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add test for __memset_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memset.h: New file. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.c: Likewise. * sysdeps/x86_64/multiarch/memset.S: Removed. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_chk_erms): New function.
* x86-64: Implement memmove family IFUNC selectors in CH.J. Lu2017-06-1420-474/+418
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement memmove family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memmove family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memcpy_chk-nonshared, mempcpy_chk-nonshared and memmove_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memmove_chk_erms, __memcpy_chk_erms and __mempcpy_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memmove.h: New file. * sysdeps/x86_64/multiarch/memcpy.c: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Removed. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_chk_erms): New function. (__memmove_chk_erms): Likewise. (__memcpy_chk_erms): New alias.
* i686: Add missing IS_IN (libc) guards to vectorized strcspnFlorian Weimer2017-06-142-3/+7
| | | | | | | | | Since commit d957c4d3fa48d685ff2726c605c988127ef99395 (i386: Compile rtld-*.os with -mno-sse -mno-mmx -mfpmath=387), vector intrinsics can no longer be used in ld.so, even if the compiled code never makes it into the final ld.so link. This commit adds the missing IS_IN (libc) guard to the SSE 4.2 strcspn implementation, so that it can be used from ld.so in the future.
* Remove __need macros from errno.h (__need_Emath, __need_error_t).Zack Weinberg2017-06-1413-532/+606
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is fairly complicated, not because the users of __need_Emath and __need_error_t have complicated requirements, but because the core changes had a lot of fallout. __need_error_t exists for gnulib compatibility in argz.h and argp.h. error_t itself is a Hurdism, an enum containing all the E-constants, so you can do 'p (error_t) errno' in gdb and get a symbolic value. argz.h and argp.h use it for function return values, and they want to fall back to 'int' when that's not available. There is no reason why these nonstandard headers cannot just go ahead and include all of errno.h; so we do that. __need_Emath is defined only by .S files; what they _really_ need is for errno.h to avoid declaring anything other than the E-constants (e.g. 'extern int __errno_location(void);' is a syntax error in assembly language). This is replaced with a check for __ASSEMBLER__ in errno.h, plus a carefully documented requirement for bits/errno.h not to define anything other than macros. That in turn has the consequence that bits/errno.h must not define errno - fortunately, all live ports use the same definition of errno, so I've moved it to errno.h. The Hurd bits/errno.h must also take care not to define error_t when __ASSEMBLER__ is defined, which involves repeating all of the definitions twice, but it's a generated file so that's okay. * stdlib/errno.h: Remove __need_Emath and __need_error_t logic. Reorganize file. Declare errno here. When __ASSEMBLER__ is defined, don't declare anything other than the E-constants. * include/errno.h: Change conditional for exposing internal declarations to (not _ISOMAC and not __ASSEMBLER__). * bits/errno.h: Remove logic for __need_Emath. Document requirements for a port-specific bits/errno.h. * sysdeps/unix/sysv/linux/bits/errno.h * sysdeps/unix/sysv/linux/alpha/bits/errno.h * sysdeps/unix/sysv/linux/hppa/bits/errno.h * sysdeps/unix/sysv/linux/mips/bits/errno.h * sysdeps/unix/sysv/linux/sparc/bits/errno.h: Add multiple-include guard and check against improper inclusion. Remove __need_Emath logic. Don't declare errno here. Ensure all constants are defined as simple integer literals. Consistent formatting. * sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and enum __error_t_codes if __ASSEMBLER__ is not defined. * sysdeps/mach/hurd/bits/errno.h: Regenerate. * argp/argp.h, string/argz.h: Don't define __need_error_t before including errno.h. * sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S * sysdeps/x86_64/fpu/s_cosf.S * sysdeps/x86_64/fpu/s_sincosf.S * sysdeps/x86_64/fpu/s_sinf.S: Just include errno.h; don't define __need_Emath or include bits/errno.h directly.
* Remove __need_IOV_MAX and __need_FOPEN_MAX.Zack Weinberg2017-06-143-61/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __need_FOPEN_MAX wasn't being used anywhere. __need_IOV_MAX was more complicated; the basic deal is that sys/uio.h wants to define a constant named UIO_MAXIOV and bits/xopen_lim.h wants to define a constant named IOV_MAX, with the same meaning. For no apparent reason this was being handled via bits/stdio_lim.h -- stdio.h is NOT supposed to define IOV_MAX -- and some mess in Makerules. Also, bits/uio.h on Linux was being used as a dumping ground for extension functions. So now we have bits/uio_lim.h, which defines __IOV_MAX. bits/xopen_lim.h and sys/uio.h use that to define their respective constants. We also now have bits/uio-ext.h, which is the official Proper Home for extensions to sys/uio.h. bits/uio.h is removed, and stdio_lim.h doesn't define IOV_MAX at all. * bits/uio_lim.h, sysdeps/unix/sysv/linux/bits/uio_lim.h * bits/uio-ext.h, sysdeps/unix/sysv/linux/bits/uio-ext.h: New file. * bits/uio.h, sysdeps/unix/sysv/linux/bits/uio.h: Delete file. * include/bits/xopen_lim.h: Use bits/uio_lim.h to get the value for IOV_MAX. * misc/Makefile: Install bits/uio-ext.h and bits/uio_lim.h. Don't install bits/uio.h. * misc/sys/uio.h: Don't include bits/uio.h. Do include bits/types/struct_iovec.h and bits/uio_lim.h. Set UIO_MAXIOV based on __IOV_MAX. Under __USE_GNU, also include bits/uio-ext.h. * stdio-common/stdio_lim.h.in: Remove logic for __need_FOPEN_MAX and __need_IOV_MAX. Don't define IOV_MAX at all. * Makerules (stdio_lim.h): Remove logic for setting IOV_MAX. * sysdeps/unix/sysv/linux/bits/fcntl-linux.h: Include bits/types/struct_iovec.h, not bits/uio.h. Use __ssize_t, not ssize_t, in function prototypes. Don't use hard TAB for double space after period in comments.
* PowerPC64 ELFv2 PPC64_OPT_LOCALENTRYAlan Modra2017-06-1421-35/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, that have no requirement on r2 or r12 and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. This patch implements the ld.so changes necessary for this optimization. ld.so needs to check that an optimized plt call sequence is in fact calling a function implemented with localentry:0, end emit a fatal error otherwise. The elf/testobj6.c change is to stop "error while loading shared libraries: expected localentry:0 `preload'" when running elf/preloadtest, which we'd get otherwise. * elf/elf.h (PPC64_OPT_LOCALENTRY): Define. * sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add refsym and sym parameters. Adjust callers. * sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New. (_dl_reloc_overflow): Increase buffser size. Formatting. * sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset): Delete reloc param, add refsym and sym. Check optimized plt call stubs for localentry:0 functions. Adjust callers. (elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym and sym parameters. Adjust callers. (_dl_reloc_overflow): Move attribute. (_dl_error_localentry): Declare. * elf/dl-runtime.c (_dl_fixup): Save original sym. Pass refsym and sym to elf_machine_fixup_plt. * elf/testobj6.c (preload): Call printf.
* PowerPC64 ENTRY_TOCLESSAlan Modra2017-06-14102-138/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of functions in the sysdeps/powerpc/powerpc64/ tree don't use or change r2, yet declare a global entry that sets up r2. This patch fixes that problem, and consolidates the ENTRY and EALIGN macros. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (NOPS, ENTRY_3): New macros. (ENTRY): Rewrite. (ENTRY_TOCLESS): Define. (EALIGN, EALIGN_W_0, EALIGN_W_1, EALIGN_W_2, EALIGN_W_4, EALIGN_W_5, EALIGN_W_6, EALIGN_W_7, EALIGN_W_8): Delete. * sysdeps/powerpc/powerpc64/a2/memcpy.S: Replace EALIGN with ENTRY. * sysdeps/powerpc/powerpc64/dl-trampoline.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_cosf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_sinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcasestr.S: Likewise. * sysdeps/powerpc/powerpc64/addmul_1.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/powerpc64/cell/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_fabsl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/lshift.S: Likewise. * sysdeps/powerpc/powerpc64/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/mul_1.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power7/add_n.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise. * sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcasecmp.S (strcasecmp_l): Likewise. * sysdeps/powerpc/powerpc64/power7/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strchrnul.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strspn.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/ppc-mcount.S: Store LR earlier. Don't add nop when SHARED. * sysdeps/powerpc/powerpc64/start.S: Fix comment. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY): Don't define. (ENTRY_TOCLESS): Define. * sysdeps/powerpc/powerpc32/sysdep.h (ENTRY_TOCLESS): Define. * sysdeps/powerpc/fpu/s_fma.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* PowerPC64 strncpy, stpncpy and strstr fixesAlan Modra2017-06-148-0/+39
| | | | | | | | | | | | | | | | | | | | | Makes __stpncpy_power8 call __memset_power8 directly rather than via an IFUNC. Fixes a missing _mcount, and removes some redundant NOPS. The *_is_local defines are also used in a followup patch. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Define MEMSET_is_local. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise. Define MEMSET. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define STRLEN_is_local, STRNLEN_is_local, and STRCHR_is_local. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. Don't add nop after local calls. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Define MEMSET_is_local. Don't add nop after local call. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. Add missing CALL_MCOUNT.
* PowerPC64 sysdep.h tidyAlan Modra2017-06-142-59/+59
| | | | | | | | | | | | | | | | | | | | | | .align on some targets takes a byte alignment, on others like powerpc, log2 of the byte alignment. It's a good idea to avoid .align, particularly since x86 and powerpc are different. This patch fixes the occurrences of .align in powerpc64/sysdep.h, renames DOT_LABEL since the macro doesn't have anything to do with adding dots, removes extraneous semicolons, and fixes some formatting. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (FUNC_LABEL): Rename from DOT_LABEL. (ENTRY_1): Use FUNC_LABEL and remove leading space from label. Use .p2align rather than .align. (TRACEBACK, TRACEBACK_MASK): Use .p2align rather than .align. (ABORT_TRANSACTION): Likewise. (ENTRY_1, ENTRY_2, END_2, LOCALENTRY): Remove unnecessary semicolons, particularly at end. Add semicolon at invocation as necessary. (TRACEBACK, TRACEBACK_MASK, PSEUDO, PSEUDO_NOERRNO): Likewise. (PSEUDO_ERRVAL, PPC64_LOAD_FUNCPTR, OPD_ENT): Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY, END): Adjust to suit.
* PowerPC64 FRAME_PARM_SAVEAlan Modra2017-06-142-37/+16
| | | | | | | | | | | | | I think FRAME_PARM[1-9]_SAVE confuse the code, particularly FRAME_PARM9_SAVE. There are only 8 parameter save slots! * sysdeps/powerpc/powerpc64/sysdep.h: (FRAME_BACKCHAIN, FRAME_CR_SAVE, FRAME_LR_SAVE): Move out of conditional. (FRAME_PARM1_SAVE, FRAME_PARM2_SAVE, FRAME_PARM3_SAVE, FRAME_PARM4_SAVE, FRAME_PARM5_SAVE, FRAME_PARM6_SAVE, FRAME_PARM7_SAVE, FRAME_PARM8_SAVE, FRAME_PARM9_SAVE): Delete. * sysdeps/unix/sysv/linux/powerpc/powerpc64/makecontext.S: Replace uses of FRAME_PARM[1-9]_SAVE with FRAME_PARM_SAVE plus offset.
* PowerPC64, fix calls to _mcountAlan Modra2017-06-141-8/+3
| | | | | | | The macros used in assembly were broken on powerpc64 ELFv1. * sysdeps/powerpc/powerpc64/sysdep.h: (call_mcount_parm_offset): Delete. (SAVE_ARG, REST_ARG, CFI_SAVE_ARG): Correct.
* mips: Fix store/load gp registers to/from ucontext_tGordana Cmiljanovic2017-06-136-81/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | General purpose registers in mcontext_t structure are 8 bytes long for both MIPS32/MIPS64. get/set/make/swap context implementations for MIPS O32 incorrectly assume that general purpose registers in this structure are 4 bytes long. This patch is fixing that. Tested for MIPS O32 LE and BE. Compared objdump of modified functions for mips n32 and mips n64. [BZ #21548] * sysdeps/unix/sysv/linux/mips/getcontext.S: Define MCONTEXT_SZGREG as 8 and use it when copying general purpose registers. * sysdeps/unix/sysv/linux/mips/makecontext.S: Likewise. * sysdeps/unix/sysv/linux/mips/mips32/Makefile: Include new test for mips o32. * sysdeps/unix/sysv/linux/mips/mips32/bug-getcontext-mips-gp.c: Added new test for mips o32. * sysdeps/unix/sysv/linux/mips/setcontext.S: Define MCONTEXT_SZGREG as 8 and use it when copying general purpose registers. * sysdeps/unix/sysv/linux/mips/swapcontext.S: Likewise.
* Remove __need_schedparam and __cpu_set_t_defined.Zack Weinberg2017-06-121-121/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bits/sched.h has logic to expose only an impl-namespace variant of struct sched_param (i.e. struct __sched_param), but nothing uses it, and the only header that includes bits/sched.h is sched.h. The __need_schedparam logic can therefore be removed. bits/sched.h also has a great deal of code relating to cpu_set_t objects that was *almost* the same between the two versions of bits/sched.h in the tree; a little spelunking indicated that this is because some bug fixes got applied to the Linux-specific bits/sched.h but not the generic one. Introduce a new header, bits/cpu-set.h, containing the version of that code with the bugfixes, have sched.h include it directly, and delete all of the code from both versions of bits/sched.h. Also remove the unnecessary name mangling in the definition of struct sched_param -- POSIX specifies a field 'sched_priority', so there is no reason to define it as '__sched_priority' and then paper over that with a macro. (Just in case someone was using the internal name, 'sched_priority' remains a macro defined to expand to itself, and '__sched_priority' now expands to 'sched_priority'.) Finally, as long as I'm touching these files anyway, merge new constants from linux/sched.h into the Linux bits/sched.h. * bits/sched.h: Remove __need_schedparam logic and replace with a normal multiple-include guard. Change field name in struct sched_param from __sched_priority to sched_priority. Delete everything under #ifndef __cpu_set_t_defined. * sysdeps/unix/sysv/linux/bits/sched.h: Likewise. Also sync with kernel sched.h, adding SCHED_ISO and SCHED_DEADLINE constants. * posix/sched.h: Include bits/cpu-set.h as well as bits/sched.h. For compatibility, #define sched_priority to itself, and #define __sched_priority as sched_priority. * posix/bits/cpu-set.h: New file containing, verbatim, the code that was under #ifndef __cpu_set_t_defined in sysdeps/unix/sysv/linux/bits/sched.h. * include/bits/cpu-set.h: New wrapper. * posix/Makefile: Install bits/cpu-set.h.
* float128: Add strtof128, wcstof128, and related functions.Paul E. Murphy2017-06-1210-0/+291
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementations are contained with sysdeps/ieee754/float128 as they are only built when _Float128 is enabled within libc/m. * include/gmp.h (__mpn_construct_float128): New declaration. * include/stdlib.h: Include bits/floatn.h for _Float128 tests. (__strtof128_l): New declaration. (__strtof128_nan): Likewise. (__wcstof128_nan): Likewise. (__strtof128_internal): Likewise. (____strtof128_l_internal): Likewise. * include/wchar.h: Include bits/floatn.h for _Float128 tests. (__wcstof128_l): New declaration. (__wcstof128_internal): Likewise. * stdlib/Makefile (bug-strtod2): Link libm too. * stdlib/stdlib.h (strtof128): New declaration. (strtof128_l): Likewise. * stdlib/tst-strtod-nan-locale-main.c: Updated to use tst-strtod.h macros to ensure float128 gets tested too. * stdlib/tst-strtod-round-skeleton.c (CHOOSE_f128): New macro. * stdlib/tst-strtod.h: Include bits/floatn.h for _Float128 tests. (IF_FLOAT128): New macro. (GEN_TEST_STRTOD): Update to optionally include _Float128 in the tests. (STRTOD_TEST_FOREACH): Likewise. * sysdeps/ieee754/float128/Makefile: Insert new strtof128 and wcstof128 functions into libc. * sysdeps/ieee754/float128/Versions: Add exports for the above new functions. * sysdeps/ieee754/float128/mpn2float128.c: New file. * sysdeps/ieee754/float128/strtod_nan_float128.h: New file. * sysdeps/ieee754/float128/strtof128.c: New file. * sysdeps/ieee754/float128/strtof128_l.c: New file. * sysdeps/ieee754/float128/strtof128_nan.c: New file. * sysdeps/ieee754/float128/wcstof128.c: New file. * sysdeps/ieee754/float128/wcstof128_l.c: New file. * sysdeps/ieee754/float128/wcstof128_nan.c: New fike. * wcsmbs/Makefile: (CFLAGS-wcstof128.c): Append strtox-CFLAGS. (CFLAGS-wcstof128_l): Likewise. * wcsmbs/wchar.h: Include bits/floatn.h for _Float128 tests. (wcstof128): New declaration. (wcstof128_l): Likewise.
* x86-64: Implement strcpy family IFUNC selectors in CH.J. Lu2017-06-1214-131/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement strcpy family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcpy family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcpy-sse2 and stpcpy-sse2. * sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file. * sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/stpcpy.c: Likewise. * sysdeps/x86_64/multiarch/stpncpy.c: Likewise. * sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.c: Likewise. * sysdeps/x86_64/multiarch/strncpy.c: Likewise. * sysdeps/x86_64/multiarch/stpcpy.S: Removed. * sysdeps/x86_64/multiarch/stpncpy.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strncpy.S: Likewise. * sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New. (libc_hidden_def): Always defined as empty. * sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def): Always Defined as empty.
* Replace all internal uses of __bzero with memset. This removes the needWilco Dijkstra2017-06-121-3/+3
| | | | | | | | | | | | | | | to redirect it to a builtin and means memset is inlined whenever possible, including with -Os. * sunrpc/bindrsvprt.c (bindresvport): Change __bzero to memset. * sunrpc/clnt_gen.c (clnt_create): Likewise. * sunrpc/des_impl.c (_des_crypt): Likewise. * sunrpc/key_call.c (key_gendes): Likewise. * sunrpc/pmap_rmt.c (clnt_broadcast): Likewise. * sunrpc/svc_simple.c (universal): Likewise. * sunrpc/svc_tcp.c (svctcp_create): Likewise. * sunrpc/svc_udp.c (svcudp_bufcreate): Likewise. * sysdeps/arm/aeabi_memclr.c (__aeabi_memclr): Likewise.
* powerpc: add sysconf support for cache geometriesPaul Clarke2017-06-093-0/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | There is currently no "cross-platform" (x86 and POWER) support for determining the cacheline size. This patch adds support to sysconf() to correctly report cacheline sizes based on the information in the auxilliary vector. Thus, using sysconf() is a cross-platform (x86 and POWER) solution for determining cacheline sizes. Support is added (on powerpc) for: _SC_LEVEL1_ICACHE_SIZE _SC_LEVEL1_ICACHE_ASSOC _SC_LEVEL1_ICACHE_LINESIZE _SC_LEVEL1_DCACHE_SIZE _SC_LEVEL1_DCACHE_ASSOC _SC_LEVEL1_DCACHE_LINESIZE _SC_LEVEL2_CACHE_SIZE _SC_LEVEL2_CACHE_ASSOC _SC_LEVEL2_CACHE_LINESIZE _SC_LEVEL3_CACHE_SIZE _SC_LEVEL3_CACHE_ASSOC _SC_LEVEL3_CACHE_LINESIZE * sysdeps/unix/sysv/linux/powerpc/sysconf.c: New file. Add powerpc-specific overrides for L1, L2, L3 CACHE_SIZEs, CACHE_ASSOCs, and CACHE_LINESIZEs, retrieving from auxv. * sysdeps/unix/sysv/linux/powerpc/test-powerpc-linux-sysconf.c: New file. Invoke newly supported sysconf values for powerpc, and report results. If none are supported, report so. * sysdeps/unix/sysv/linux/powerpc/Makefile (tests): Add new test, tst-sysconf.
* Fix waitid namespace (bug 21561).Joseph Myers2017-06-091-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In sys/wait.h, waitid and associated constants and types are UX-shaded in XPG4.2 (so not in XPG4), and XSI-shaded in POSIX before 2008, so should be appropriately conditional in the headers. This patch fixes the conditionals accordingly. (WCONTINUED is actually still XSI-shaded in POSIX.1:2008, but W* is also reserved there without XSI-shading, so nothing special needs to be done about the conditionals on WCONTINUED to conform to POSIX.1:2008 namespace rules.) Tested for x86_64. [BZ #21561] * posix/sys/wait.h (idtype_t): Change [__USE_XOPEN] condition to [__USE_XOPEN_EXTENDED]. (id_t): Likewise. (include of <bits/types/siginfo_t.h): Likewise. (waitid): Likewise. * sysdeps/unix/sysv/linux/bits/waitflags.h (WSTOPPED): Condition on [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8]. (WEXITED): Likewise. (WCONTINUED): Likewise. (WNOWAIT): Likewise. * conform/Makefile (test-xfail-XPG4/stdlib.h/conform): Remove. (test-xfail-XPG4/sys/wait.h/conform): Likewise. (test-xfail-POSIX/sys/wait.h/conform): Likewise.
* Update nios2, sparc32 localplt.data files for recent GCC change.Joseph Myers2017-06-092-4/+5
| | | | | | | | | | | | | | | | | | | | | | A recent GCC change to expand floating-point classification built-in functions inline using integer rather than floating-point arithmetic in some cases resulted in localplt test failures for nios2 and sparc32 <https://sourceware.org/ml/libc-testresults/2017-q2/msg00320.html>. This patch updates the localplt.data files in question to mark the relevant symbols as optional / add a new optional symbol. (The GCC patch has been reverted because of other problems it caused, but one can assume it will be applied again, without changes that would affect the PLT entries generated, once those issues have been resolved.) Tested with build-many-glibcs.py. * sysdeps/unix/sysv/linux/nios2/localplt.data (__gtdf2): Mark libc.so PLT entry optional. (__gtsf2): Likewise. (__unorddf2): Likewise. (__unordsf2): Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/localplt.data (_Q_fgt): New optional libc.so PLT entry.
* x86-64: Correct comments in ifunc-impl-list.cH.J. Lu2017-06-091-6/+6
| | | | * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Correct comments.
* x86-64: Optimize strrchr/wcsrchr with AVX2H.J. Lu2017-06-098-0/+368
| | | | | | | | | | | | | | | | | | | | Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector instructions. It is as fast as SSE2 version for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strrchr_avx2, __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2. * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strrchr.c: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.
* x86-64: Optimize memrchr with AVX2H.J. Lu2017-06-095-0/+424
| | | | | | | | | | | | | | | | | Optimize memrchr with AVX2 to search 32 bytes with a single vector compare instruction. It is as fast as SSE2 memrchr for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 memrchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memrchr-sse2 and memrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memrchr_avx2 and __memrchr_sse2. * sysdeps/x86_64/multiarch/memrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/memrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memrchr.c: Likewise.
* x86-64: Optimize strchr/strchrnul/wcschr with AVX2H.J. Lu2017-06-0912-58/+492
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector instructions. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strchr-sse2, strchrnul-sse2, strchr-avx2, strchrnul-avx2, wcschr-sse2 and wcschr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strchr_avx2, __strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and __wcschr_sse2. * sysdeps/x86_64/multiarch/strchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchr.c: Likewise. * sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul.c: Likewise. * sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr.c: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Removed.
* x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2H.J. Lu2017-06-0913-1/+621
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 to check 32 bytes with a single vector compare instruction. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strlen-sse2, strnlen-sse2, strlen-avx2, strnlen-avx2, wcslen-sse2, wcslen-avx2 and wcsnlen-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strlen_avx2, __strlen_sse2, __strnlen_avx2, __strnlen_sse2, __wcslen_avx2, __wcslen_sse2 and __wcsnlen_avx2. * sysdeps/x86_64/multiarch/strlen-avx2.S: New file. * sysdeps/x86_64/multiarch/strlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcslen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c (OPTIMIZE (avx2)): New. (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.