about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* Update x86_64 libm-test-ulps.Joseph Myers2017-09-291-2/+2
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* x86: Allow undefined _DYNAMIC in static executableH.J. Lu2017-09-281-2/+9
| | | | | | | | | | | | When --enable-static-pie is used to build static PIE, _DYNAMIC is used to compute the load address of static PIE. But _DYNAMIC is undefined when creating static executable. This patch makes _DYNAMIC weak in PIE libc.a so that it can be undefined. * sysdeps/i386/dl-machine.h (elf_machine_load_address): Allow undefined _DYNAMIC in PIE libc.a. * sysdeps/x86_64/dl-machine.h (elf_machine_load_address): Likewse.
* Add SSE4.1 trunc, truncf (bug 20142).Joseph Myers2017-09-207-2/+116
| | | | | | | | | | | | | | | | | | | | This patch adds SSE4.1 versions of trunc and truncf, using the roundsd / roundss instructions, similar to the versions of ceil, floor, rint and nearbyint functions we already have. In my testing with the glibc benchtests these are about 30% faster than the C versions for double, 20% faster for float. Tested for x86_64. [BZ #20142] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1. * sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file. * sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
* x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]H.J. Lu2017-09-121-5/+8
| | | | | | | | | | | | | | | | | | | | | AVX512 functions in mathvec are used on machines with AVX512. An AVX2 wrapper is also provided and it can be used when the AVX512 version isn't profitable. MathVec_Prefer_No_AVX512 is addded to cpu-features. If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES environment variable, the AVX2 wrapper will be used. Tested on x86-64 machines with and without AVX512. Also verified glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine. [BZ #21967] * sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512): New. (index_arch_MathVec_Prefer_No_AVX512): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Handle MathVec_Prefer_No_AVX512. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h (IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512 is set.
* x86: Add x86_64 to x86-64 HWCAP [BZ #22093]H.J. Lu2017-09-113-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the "x86_64" subdirectory when loading a shared library. ld.so in glibc 2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based on supported ISAs. This led to shared library loading failure for shared libraries placed under the "x86_64" subdirectory. This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always search the "x86_64" subdirectory when loading a shared library. NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi" machines. Tested on i686 and x86-64. [BZ #22093] * sysdeps/x86/cpu-features.c (init_cpu_features): Initialize GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64. * sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated. (HWCAP_IMPORTANT): Likewise. (HWCAP_X86_64): New enum. (HWCAP_X86_AVX512_1): Updated. * sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64". * sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1. (modules-names): Add x86_64/tst-x86_64mod-1. (LDFLAGS-tst-x86_64mod-1.so): New. ($(objpfx)tst-x86_64-1): Likewise. ($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise. (tst-x86_64-1-clean): Likewise. * sysdeps/x86_64/tst-x86_64-1.c: New file. * sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
* Update x86_64 ulps for AMD Ryzen.Markus Trippelsdorf2017-09-081-2/+2
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Update for AMD Ryzen.
* Remove remaining _HAVE_STRING_ARCH_* definitions (BZ #18858)Adhemerval Zanella2017-09-066-6/+0
| | | | | | | | | | | | | | | | | | | | | | | Since the removal of bits/string.h, _HAVE_STRING_ARCH_* are no longer used. This patch removes the unused macros from i686 and x86_64 sysdeps folder. Checked on x86_64-linux-gnu and i686-linux-gnu. * sysdeps/i386/i686/multiarch/strncpy.c (_HAVE_STRING_ARCH_strncpy): Remove define. * sysdeps/x86_64/multiarch/stpcpy.c (_HAVE_STRING_ARCH_stpcpy): Likewise. * sysdeps/x86_64/multiarch/strcspn.c (_HAVE_STRING_ARCH_strcspn): Likewise. * sysdeps/x86_64/multiarch/strncat.c (_HAVE_STRING_ARCH_strncat): Likewise. * sysdeps/x86_64/multiarch/strncpy.c (_HAVE_STRING_ARCH_strncpy): Likewise. * sysdeps/x86_64/multiarch/strpbrk.c (_HAVE_STRING_ARCH_strpbrk): Likewise. * sysdeps/x86_64/multiarch/strspn.c (_HAVE_STRING_ARCH_strspn): Likewise.
* Obsolete pow10 functions.Joseph Myers2017-09-011-30/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch obsoletes the pow10, pow10f and pow10l functions (makes them into compat symbols, not available for new ports or static linking). The exp10 names for these functions are standardized (in TS 18661-4) and were added in the same glibc version (2.1) as pow10 so source code can change to use them without any loss of portability. Since pow10 is deliberately not provided for _Float128, only exp10, this slightly simplifies moving to the new wrapper templates in the !LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10, pow10f and pow10l to be defined by those templates. Tested for x86_64, and with build-many-glibcs.py. * manual/math.texi (pow10): Do not document. (pow10f): Likewise. (pow10l): Likewise. * math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare. * math/bits/math-finite.h [__USE_GNU] (pow10): Likewise. * math/libm-test-exp10.inc (pow10_test): Remove. (do_test): Do not call pow10. * math/w_exp10_compat.c (pow10): Make into compat symbol. [NO_LONG_DOUBLE] (pow10l): Likewise. * math/w_exp10f_compat.c (pow10f): Likewise. * math/w_exp10l_compat.c (pow10l): Likewise. * sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>. (pow10): Make into compat symbol. * sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>. (pow10f): Make into compat symbol. * sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>. (pow10l): Make into compat symbol. * sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove pow10. (CFLAGS-nldbl-pow10.c): Remove variable.. * sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file. * sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on [SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)]. * sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol): Undefine and redefine. (pow10l): Make into compat symbol. * sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps. * sysdeps/alpha/fpu/libm-test-ulps: Likewise. * sysdeps/arm/libm-test-ulps: Likewise. * sysdeps/hppa/fpu/libm-test-ulps: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/microblaze/libm-test-ulps: Likewise. * sysdeps/mips/mips32/libm-test-ulps: Likewise. * sysdeps/mips/mips64/libm-test-ulps: Likewise. * sysdeps/nios2/libm-test-ulps: Likewise. * sysdeps/powerpc/fpu/libm-test-ulps: Likewise. * sysdeps/powerpc/nofpu/libm-test-ulps: Likewise. * sysdeps/s390/fpu/libm-test-ulps: Likewise. * sysdeps/sh/libm-test-ulps: Likewise. * sysdeps/sparc/fpu/libm-test-ulps: Likewise. * sysdeps/tile/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* elf: Remove internal_function attributeFlorian Weimer2017-08-313-4/+2
|
* x86_64 __redirect_ieee754_expf: Change double to floatH.J. Lu2017-08-281-1/+1
| | | | | | | __redirect_ieee754_expf has type float, not double. * sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf): Change double to float.
* x86-64: Regenerate libm-test-ulps for AVX512 mathvec testsH.J. Lu2017-08-231-2/+2
| | | | | | | Update libm-test-ulps for AVX512 mathvec tests by running “make regen-ulps” on Intel Xeon processor with AVX512. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* x86_64: Replace AVX512F .byte sequences with instructionsH.J. Lu2017-08-236-290/+32
| | | | | | | | | | | | | | | | | | Since binutils 2.25 or later is required to build glibc, we can replace AVX512F .byte sequences with AVX512F instructions. Tested on x86-64 and x32. There are no code differences in libmvec.so and libmvec.a. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Replace AVX512F .byte sequences with AVX512F instructions. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise.
* Mark internal SSE2 functions with attribute_hidden [BZ #18822]H.J. Lu2017-08-192-2/+2
| | | | | | | | | | Mark internal SSE2 functions with attribute_hidden to allow direct access within libc.so and libc.a without using GOT nor PLT. [BZ #18822] * sysdeps/x86_64/multiarch/strcspn-c.c (STRCSPN_SSE2): Add attribute_hidden. (__strspn_sse2): Likewise.
* x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ #21966]H.J. Lu2017-08-181-1/+2
| | | | | | | | | | Since the AVX2 version of mathvec functions uses FMA, it can only be used when FMA is usable. [BZ #21966] * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h (IFUNC_SELECTOR): Don't use the AVX2 version if FMA isn't usable.
* x86-64: Optimize e_expf with FMA [BZ #21912]H.J. Lu2017-08-164-0/+245
| | | | | | | | | | | FMA optimized e_expf improves performance by more than 50% on Skylake. [BZ #21912] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_expf-fma. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise.
* x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955]H.J. Lu2017-08-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysdeps/x86_64/fpu/e_expf.S has lea L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */ cmpl (%rdx,%rax,4), %ecx /* |x|<under/overflow bound ? */ ... /* Here if |x| is Inf */ lea L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */ movss (%rdx,%rax,4), %xmm0 /* return zero or Inf */ ret ... .section .rodata.cst8,"aM",@progbits,8 ... .p2align 2 L(SP_RANGE): /* single precision overflow/underflow bounds */ .long 0x42b17217 /* if x>this bound, then result overflows */ .long 0x42cff1b4 /* if x<this bound, then result underflows */ .type L(SP_RANGE), @object ASM_SIZE_DIRECTIVE(L(SP_RANGE)) .p2align 2 L(SP_INF_0): .long 0x7f800000 /* single precision Inf */ .long 0 /* single precision zero */ .type L(SP_INF_0), @object ASM_SIZE_DIRECTIVE(L(SP_INF_0)) Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must be aligned to 8 bytes. [BZ #21955] * sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Aligned to 8 bytes. (L(SP_INF_0)): Likewise.
* ld.so: Introduce struct dl_exceptionFlorian Weimer2017-08-101-0/+2
| | | | | | This commit separates allocating and raising exceptions. This simplifies catching and re-raising them because it is no longer necessary to make a temporary, on-stack copy of the exception message.
* x86-64: Add FMA multiarch functions to libmH.J. Lu2017-08-0732-105/+493
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds multiarch functions optimized with -mfma -mavx2 to libm. e_pow-fma.c is compiled with $(config-cflags-nofma) due to PR 19003. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add e_exp-fma, e_log-fma, e_pow-fma, s_atan-fma, e_asin-fma, e_atan2-fma, s_sin-fma, s_tan-fma, mplog-fma, mpa-fma, slowexp-fma, slowpow-fma, sincos32-fma, doasin-fma, dosincos-fma, halfulp-fma, mpexp-fma, mpatan2-fma, mpatan-fma, mpsqrt-fma, and mptan-fma. (CFLAGS-doasin-fma.c): New. (CFLAGS-dosincos-fma.c): Likewise. (CFLAGS-e_asin-fma.c): Likewise. (CFLAGS-e_atan2-fma.c): Likewise. (CFLAGS-e_exp-fma.c): Likewise. (CFLAGS-e_log-fma.c): Likewise. (CFLAGS-e_pow-fma.c): Likewise. (CFLAGS-halfulp-fma.c): Likewise. (CFLAGS-mpa-fma.c): Likewise. (CFLAGS-mpatan-fma.c): Likewise. (CFLAGS-mpatan2-fma.c): Likewise. (CFLAGS-mpexp-fma.c): Likewise. (CFLAGS-mplog-fma.c): Likewise. (CFLAGS-mpsqrt-fma.c): Likewise. (CFLAGS-mptan-fma.c): Likewise. (CFLAGS-s_atan-fma.c): Likewise. (CFLAGS-sincos32-fma.c): Likewise. (CFLAGS-slowexp-fma.c): Likewise. (CFLAGS-slowpow-fma.c): Likewise. (CFLAGS-s_sin-fma.c): Likewise. (CFLAGS-s_tan-fma.c): Likewise. * sysdeps/x86_64/fpu/multiarch/doasin-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/dosincos-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_asin-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_atan2-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-avx-fma4.h: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-fma4.h: Likewise. * sysdeps/x86_64/fpu/multiarch/mpa-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpatan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpatan2-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpsqrt-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mptan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/sincos32-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Rewrite. * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
* x86-64: Implement libmathvec IFUNC selectors in CH.J. Lu2017-08-0476-619/+1181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines) Add svml_d_cos2_core-sse2, svml_d_cos4_core-sse, svml_d_cos8_core-avx2, svml_d_exp2_core-sse2, svml_d_exp4_core-sse, svml_d_exp8_core-avx2, svml_d_log2_core-sse2, svml_d_log4_core-sse, svml_d_log8_core-avx2, svml_d_pow2_core-sse2, svml_d_pow4_core-sse, svml_d_pow8_core-avx2 svml_d_sin2_core-sse2, svml_d_sin4_core-sse, svml_d_sin8_core-avx2, svml_d_sincos2_core-sse2, svml_d_sincos4_core-sse, svml_d_sincos8_core-avx2, svml_s_cosf16_core-avx2, svml_s_cosf4_core-sse2, svml_s_cosf8_core-sse, svml_s_expf16_core-avx2, svml_s_expf4_core-sse2, svml_s_expf8_core-sse, svml_s_logf16_core-avx2, svml_s_logf4_core-sse2, svml_s_logf8_core-sse, svml_s_powf16_core-avx2, svml_s_powf4_core-sse2, svml_s_powf8_core-sse, svml_s_sincosf16_core-avx2, svml_s_sincosf4_core-sse2, svml_s_sincosf8_core-sse, svml_s_sinf16_core-avx2, svml_s_sinf4_core-sse2 and svml_s_sinf8_core-sse. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h: New file. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h: Likewise. * sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-sse4_1.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.c: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_cos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_exp): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8v_log): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8vv_pow): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN8v_sin): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN2vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN4vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN8vvv_sincos): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_cosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_expf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_logf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8vv_powf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8vvv_sincosf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core-avx2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVeN16v_sinf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core-sse2.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVbN4v_sinf): Removed. * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core-sse.S: This. Don't include <sysdep.h> nor <init-arch.h>. (_ZGVdN8v_sinf): Removed.
* x86-64: Implement libm IFUNC selectors in CH.J. Lu2017-08-0418-120/+287
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_ceil-sse4_1, s_ceilf-sse4_1, s_floor-sse4_1, s_floorf-sse4_1, s_nearbyint-sse4_1, s_nearbyintf-sse4_1, s_rint-sse4_1 and s_rintf-sse4_1. * sysdeps/x86_64/fpu/multiarch/ifunc-sse4_1.h: New file. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_ceil-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__ceil): Removed. * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_ceilf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__ceilf): Removed. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_floor-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__floor): Removed. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_floorf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__floorf): Removed. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_nearbyint-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__nearbyint): Removed. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_nearbyintf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__nearbyintf): Removed. * sysdeps/x86_64/fpu/multiarch/s_rint.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_rint-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__rint): Removed. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Renamed to ... * sysdeps/x86_64/fpu/multiarch/s_rintf-sse4_1.S: This. Don't include <machine/asm.h> nor <init-arch.h>. Include <sysdep.h>. (__rintf): Removed.
* x86-64: Use IFUNC memcpy and mempcpy in libc.aH.J. Lu2017-08-049-38/+24
| | | | | | | | | | | | | | | | | | | | | | | Since apply_irel is called before memcpy and mempcpy are called, we can use IFUNC memcpy and mempcpy in libc.a. * sysdeps/x86_64/memmove.S (MEMCPY_SYMBOL): Don't check SHARED. (MEMPCPY_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test memcpy and mempcpy in libc.a. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Also include in libc.a. * sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memcpy.c: Also include in libc.a. (__hidden_ver1): Don't use in libc.a. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S (__mempcpy): Don't create a weak alias in libc.a. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Support libc.a. * sysdeps/x86_64/multiarch/mempcpy.c: Also include in libc.a. (__hidden_ver1): Don't use in libc.a.
* Check linker support for INSERT in linker scriptH.J. Lu2017-08-041-0/+2
| | | | | | | | | | | | | | | | | Since gold doesn't support INSERT in linker script: https://sourceware.org/bugzilla/show_bug.cgi?id=21676 tst-split-dynreloc fails to link with gold. Check if linker supports INSERT in linker script before using it. * config.make.in (have-insert): New. * configure.ac (libc_cv_insert): New. Set to yes if linker supports INSERT in linker script. (AC_SUBST(libc_cv_insert): New. * configure: Regenerated. * sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc only if $(have-insert) == yes.
* x86: Remove __memset_zero_constant_len_parameter [BZ #21790]H.J. Lu2017-08-041-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __memset_zero_constant_len_parameter should be removed by commit 61062f56304750c367c5c1533351621353c112a7 Author: Ulrich Drepper <drepper@redhat.com> Date: Tue Mar 1 00:35:23 2005 +0000 2005-02-24 Roland McGrath <roland@redhat.com> * debug/Versions (libc: GLIBC_2.4): Remove __memset_zero_constant_len_parameter. * sysdeps/generic/memset_chk.c: Remove alias and warning. * misc/sys/cdefs.h (__warndecl): New macro. * debug/warning-nop.c: New file. * string/bits/string3.h (memset): Call __warn_memset_zero_len with no arguments, instead of calling __memset_zero_constant_len_parameter. Use __warndecl for __warn_memset_zero_len. * debug/Makefile (routines): Add $(static-only-routines). (static-only-routines): New variable. This patch removes the last emaining pieces of it. Tested it on i586, i686 and x86-64. [BZ #21790] * sysdeps/i386/i586/memset.S (__memset_zero_constant_len_parameter): Removed. * sysdeps/i386/i686/memset.S (__memset_zero_constant_len_parameter): Likewise. * sysdeps/i386/i686/multiarch/memset_chk.S (__memset_zero_constant_len_parameter): Likewise. * sysdeps/x86_64/memset.S (__memset_zero_constant_len_parameter): Likewise.
* x86-64: Check PIC instead of SHARED in start.SH.J. Lu2017-08-021-1/+1
| | | | | | | Since start.o may be compiled as PIC, we should check PIC instead of SHARED. * sysdeps/x86_64/start.S (_start): Check PIC instead of SHARED.
* x86-64: Test memmove_chk and memset_chk only in libc.so [BZ #21741]H.J. Lu2017-07-101-0/+4
| | | | | | | | | | Since there are no multiarch versions of memmove_chk and memset_chk, test multiarch versions of memmove_chk and memset_chk only in libc.so. [BZ #21741] * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test memmove_chk and memset_chk only in libc.so.
* x86-64: Update comments in IFUNC selectorsH.J. Lu2017-07-0916-21/+16
| | | | | | | | | | | | | | | | | | | * sysdeps/x86_64/multiarch/memcmp.c: Update comments. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memrchr.c: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/rawmemchr.c: Likewise. * sysdeps/x86_64/multiarch/strchrnul.c: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcschr.c: Likewise. * sysdeps/x86_64/multiarch/wcscpy.c: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c: Likewise. * sysdeps/x86_64/multiarch/wmemchr.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemset.c: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* x86-64: Update comments in ifunc-impl-list.cH.J. Lu2017-07-091-16/+16
| | | | | | | All x86-64 IFUNC selectors are written in C now. Update comments to reflect it. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update comments.
* x86-64: Align the stack in __tls_get_addr [BZ #21609]H.J. Lu2017-07-066-2/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | This change forces realignment of the stack pointer in __tls_get_addr, so that binaries compiled by GCCs older than GCC 4.9: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 continue to work even if vector instructions are used in glibc which require the ABI stack realignment. __tls_get_addr_slow is added to handle the slow paths in the default implementation of__tls_get_addr in elf/dl-tls.c. The new __tls_get_addr calls __tls_get_addr_slow after realigning the stack. Internal calls within ld.so go directly to the default implementation of __tls_get_addr because they do not need stack realignment. [BZ #21609] * sysdeps/x86_64/Makefile (sysdep-dl-routines): Add tls_get_addr. (gen-as-const-headers): Add rtld-offsets.sym. * sysdeps/x86_64/dl-tls.c: New file. * sysdeps/x86_64/rtld-offsets.sym: Likwise. * sysdeps/x86_64/tls_get_addr.S: Likewise. * sysdeps/x86_64/dl-tls.h: Add multiple inclusion guards. * sysdeps/x86_64/tlsdesc.sym (TI_MODULE_OFFSET): New. (TI_OFFSET_OFFSET): Likwise.
* Require binutils 2.25 or later to build glibc.Joseph Myers2017-06-282-70/+0
| | | | | | | | | | | | | | | | | | | | | | | This patch implements a requirement of binutils >= 2.25 (up from 2.22) to build glibc. Tests for 2.24 or later on x86_64 and s390 are removed. It was already the case, as indicated by buildbot results, that 2.24 was too old for building tests for 32-bit x86 (produced internal linker errors linking elf/tst-gnu2-tls1mod.so). I don't know if any configure tests for binutils features are obsolete given the increased version requirement. Tested for x86_64. * configure.ac (AS): Require binutils 2.25 or later. (LD): Likewise. * configure: Regenerated. * sysdeps/s390/configure.ac (AS): Remove version check. * sysdeps/s390/configure: Regenerated. * sysdeps/x86_64/configure.ac (AS): Remove version check. * sysdeps/x86_64/configure: Regenerated. * manual/install.texi (Tools for Compilation): Document requirement for binutils 2.25 or later. * INSTALL: Regenerated.
* x86-64: Optimize memcmp-avx2-movbe.S for short differenceH.J. Lu2017-06-271-56/+62
| | | | | | | | | | | | | Check the first 32 bytes before checking size when size >= 32 bytes to avoid unnecessary branch if the difference is in the first 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc. On Haswell, the new version is as fast as the previous one. On Skylake, the new version is a little bit faster. * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (MEMCMP): Check the first 32 bytes before checking size when size >= 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc.
* Add float128 support for x86_64, x86.Joseph Myers2017-06-262-0/+565
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables float128 support for x86_64 and x86. All GCC versions that can build glibc provide the required support, but since GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN tests and some tests of NaN payloads need to be disabled with such compilers (this does not affect the generated glibc binaries at all, just the tests). bits/floatn.h declares float128 support to be available for GCC versions that provide the required libgcc support (4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd); compilation-only support was present some time before then, but not really useful without the libgcc functions. fenv_private.h needed updating to avoid trying to put _Float128 values in registers. I make no assertion of optimality of the math_opt_barrier / math_force_eval definitions for this case; they are simply intended to be sufficient to work correctly. Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was compilation tests only with build-many-glibcs.py to verify the ABI baseline updates. I have not done any testing for Hurd, although the float128 support is enabled there as for GNU/Linux.) * sysdeps/i386/Implies: Add ieee754/float128. * sysdeps/x86_64/Implies: Likewise. * sysdeps/x86/bits/floatn.h: New file. * sysdeps/x86/float128-abi.h: Likewise. * manual/math.texi (Mathematics): Document support for _Float128 on x86_64 and x86. * sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>. (math_opt_barrier): Do not put _Float128 values in floating-point registers. (math_force_eval): Likewise. [__x86_64__] (SET_RESTORE_ROUNDF128): New macro. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append to Makefile variable. * sysdeps/x86/fpu/e_sqrtf128.c: New file. * sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc. * sysdeps/x86/math-tests.h: New file. * math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro. * math/libm-test-getpayload.inc (getpayload_test_data): Use XFAIL_FLOAT128_PAYLOAD. * math/libm-test-setpayload.inc (setpayload_test_data): Likewise. * math/libm-test-totalorder.inc (totalorder_test_data): Likewise. * math/libm-test-totalordermag.inc (totalordermag_test_data): Likewise. * sysdeps/unix/sysv/linux/i386/libc.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.SH.J. Lu2017-06-231-4/+2
| | | | | | | | | | | | | | | | | Turn movzbl -1(%rdi, %rdx), %edi movzbl -1(%rsi, %rdx), %esi orl %edi, %eax orl %esi, %ecx into movb -1(%rdi, %rdx), %al movb -1(%rsi, %rdx), %cl * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3): Replace movzbl and orl with movb.
* x86-64: Fix comment typo in memcmp-avx2-movbe.SFlorian Weimer2017-06-231-1/+1
|
* x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]Florian Weimer2017-06-231-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This code: L(between_2_3): /* Load as big endian with overlapping loads and bswap to avoid branches. */ movzwl -2(%rdi, %rdx), %eax movzwl -2(%rsi, %rdx), %ecx shll $16, %eax shll $16, %ecx movzwl (%rdi), %edi movzwl (%rsi), %esi orl %edi, %eax orl %esi, %ecx bswap %eax bswap %ecx subl %ecx, %eax ret needs a saturating subtract because the full register is used. With this commit, only the lower 24 bits of the register are used, so a regular subtraction suffices. The test case change adds coverage for these kinds of bugs.
* x86-64: Implement strcmp family IFUNC selectors in CH.J. Lu2017-06-2124-240/+605
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement strcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcmp-sse2, strcmp-sse4_2, strncmp-sse2, strncmp-sse4_2, strcasecmp_l-sse2, strcasecmp_l-sse4_2, strcasecmp_l-avx, strncase_l-sse2, strncase_l-sse4_2 and strncase_l-avx. * sysdeps/x86_64/multiarch/ifunc-strcasecmp.h: New file. * sysdeps/x86_64/multiarch/strcasecmp.c: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-avx.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l.c: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.c: Likewise. * sysdeps/x86_64/multiarch/strncase.c: Likewise. * sysdeps/x86_64/multiarch/strncase_l-avx.S : Likewise. * sysdeps/x86_64/multiarch/strncase_l-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l.c: Likewise. * sysdeps/x86_64/multiarch/strncmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strncmp-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strncmp.c: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l.S: Removed. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l.S: Likewise. * sysdeps/x86_64/multiarch/strncmp.S: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse42.S: Include <sysdep.h>. (STRCMP_SSE42): New. Defined to __strcmp_sse42 if not defined. [USE_AS_STRCASECMP_L || USE_AS_STRNCASECMP_L]: Include "locale-defines.h". (UPDATE_STRNCMP_COUNTER): New. (SECTION): Likewise. (GLABEL): Likewise. (LABEL): Likewise. * sysdeps/x86_64/multiarch/strncmp-ssse3.S: Rewrite and enable for libc.a.
* Use locale_t, not __locale_t, throughout glibcZack Weinberg2017-06-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <locale.h> is specified to define locale_t in POSIX.1-2008, and so are all of the headers that define functions that take locale_t arguments. Under _GNU_SOURCE, the additional headers that define such functions have also always defined locale_t. Therefore, there is no need to use __locale_t in public function prototypes, nor in any internal code. * ctype/ctype-c99_l.c, ctype/ctype.h, ctype/ctype_l.c * include/monetary.h, include/stdlib.h, include/time.h * include/wchar.h, locale/duplocale.c, locale/freelocale.c * locale/global-locale.c, locale/langinfo.h, locale/locale.h * locale/localeinfo.h, locale/newlocale.c * locale/nl_langinfo_l.c, locale/uselocale.c * localedata/bug-usesetlocale.c, localedata/tst-xlocale2.c * stdio-common/vfscanf.c, stdlib/monetary.h, stdlib/stdlib.h * stdlib/strfmon_l.c, stdlib/strtod_l.c, stdlib/strtof_l.c * stdlib/strtol.c, stdlib/strtol_l.c, stdlib/strtold_l.c * stdlib/strtoll_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c * string/strcasecmp.c, string/strcoll_l.c, string/string.h * string/strings.h, string/strncase.c, string/strxfrm_l.c * sysdeps/ieee754/float128/strtof128_l.c * sysdeps/ieee754/float128/wcstof128.c * sysdeps/ieee754/float128/wcstof128_l.c * sysdeps/ieee754/ldbl-128ibm/strtold_l.c * sysdeps/ieee754/ldbl-64-128/strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-compat.c * sysdeps/ieee754/ldbl-opt/nldbl-strfmon_l.c * sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-wcstold_l.c * sysdeps/powerpc/powerpc32/power7/strcasecmp.S * sysdeps/powerpc/powerpc64/power7/strcasecmp.S * sysdeps/x86_64/strcasecmp_l-nonascii.c * sysdeps/x86_64/strncase_l-nonascii.c, time/strftime_l.c * time/strptime_l.c, time/time.h, wcsmbs/mbsrtowcs_l.c * wcsmbs/wchar.h, wcsmbs/wcscasecmp.c, wcsmbs/wcsncase.c * wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c * wcsmbs/wcstof_l.c, wcsmbs/wcstol_l.c, wcsmbs/wcstold.c * wcsmbs/wcstold_l.c, wcsmbs/wcstoll_l.c, wcsmbs/wcstoul_l.c * wcsmbs/wcstoull_l.c, wctype/iswctype_l.c * wctype/towctrans_l.c, wctype/wcfuncs_l.c * wctype/wctrans_l.c, wctype/wctype.h, wctype/wctype_l.c: Change all uses of __locale_t to locale_t.
* Fix fallout from bits/string.h removal.Zack Weinberg2017-06-202-0/+4
| | | | | | | | | | | | | | | Remove one more string inline that was defined directly in string.h; in the absence of the rest of the inlines, it broke the build. Like other ifunc shims for these functions, x86_64/multiarch/{mem,st}pcpy.c need to define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * string/string.h (__mempcpy_inline): Delete. * sysdeps/x86_64/multiarch/mempcpy.c * sysdeps/x86_64/multiarch/stpcpy.c: Define NO_MEMPCPY_STPCPY_REDIRECT and __NO_STRING_INLINES before including string.h.
* Remove bits/string.h.Zack Weinberg2017-06-201-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These machine-dependent inline string functions have never been on by default, and even if they were a good idea at the time they were introduced, they haven't really been touched in ten to fifteen years and probably aren't a good idea on current-gen processors. Current thinking is that this class of optimization is best left to the compiler. * bits/string.h, string/bits/string.h * sysdeps/aarch64/bits/string.h * sysdeps/m68k/m680x0/m68020/bits/string.h * sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h * sysdeps/x86/bits/string.h: Delete file. * string/string.h: Don't include bits/string.h. * string/bits/string3.h: Rename to bits/string_fortified.h. No need to undef various symbols that the removed headers might have defined as macros. * string/Makefile (headers): Remove bits/string.h, change bits/string3.h to bits/string_fortified.h. * string/string-inlines.c: Update commentary. Remove definitions of various macros that nothing looks at anymore. Don't directly include bits/string.h. Set _STRING_INLINE_unaligned here, based on compiler-predefined macros. * string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY _is_ defined, provide internal hidden alias __strncat. * include/string.h: Declare internal hidden alias __strncat. Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is not defined. * include/bits/string3.h: Rename to bits/string_fortified.h, update to match above. * sysdeps/i386/string-inlines.c: Define compat symbols for everything formerly defined by sysdeps/x86/bits/string.h. Make existing definitions into compat symbols as well. Remove some no-longer-necessary messing around with macros. * sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/stpcpy.c * sysdeps/s390/multiarch/mempcpy.c No need to define _HAVE_STRING_ARCH_mempcpy. Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * sysdeps/i386/i686/multiarch/strncat-c.c * sysdeps/s390/multiarch/strncat-c.c * sysdeps/x86_64/multiarch/strncat-c.c Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
* Fix typo when undefining weak_aliasSiddhesh Poyarekar2017-06-191-1/+1
| | | | | | The macro directive #undef was miswritten as #undefine. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Fix typo.
* x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in CH.J. Lu2017-06-1513-115/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement strcspn/strpbrk/strspn IFUNC selectors in C All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcspn/strpbrk/strspn functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcspn-sse2, strpbrk-sse2 and strspn-sse2. * sysdeps/x86_64/strcspn.S (STRPBRK_P): Removed. Check USE_AS_STRPBRK instead of STRPBRK_P. * sysdeps/x86_64/strpbrk.S (USE_AS_STRPBRK): New. * sysdeps/x86_64/multiarch/ifunc-sse4_2.h: New file. * sysdeps/x86_64/multiarch/strcspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.c: Likewise. * sysdeps/x86_64/multiarch/strpbrk-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk.c: Likewise. * sysdeps/x86_64/multiarch/strspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strspn.c: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Removed. * sysdeps/x86_64/multiarch/strpbrk.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk-c.c: Remove "#ifdef SHARED" and "#endif".
* x86-64: Implement wcscpy IFUNC selector in CH.J. Lu2017-06-151-17/+21
| | | | | * sysdeps/x86_64/multiarch/wcscpy.S: Removed. * sysdeps/x86_64/multiarch/wcscpy.c: New file.
* x86-64: Implement strcat family IFUNC selectors in CH.J. Lu2017-06-156-90/+95
| | | | | | | | | | | | | | | | | | | | Implement strcat family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcat family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcat-sse2. * sysdeps/x86_64/multiarch/strcat-sse2.S: New file. * sysdeps/x86_64/multiarch/strcat.c: Likewise. * sysdeps/x86_64/multiarch/strncat.c: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Removed. * sysdeps/x86_64/multiarch/strncat.S: Likewise.
* x86-64: Implement memcmp family IFUNC selectors in CH.J. Lu2017-06-157-113/+126
| | | | | | | | | | | | | | | | | | | | | Implement memcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-sse2. * sysdeps/x86_64/multiarch/ifunc-memcmp.h: New file. * sysdeps/x86_64/multiarch/memcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Removed. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* x86-64: Implement memset family IFUNC selectors in CH.J. Lu2017-06-1512-147/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement memset family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memset functions within libc. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> Erich Elsen <eriche@google.com> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, and memset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add test for __memset_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memset.h: New file. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.c: Likewise. * sysdeps/x86_64/multiarch/memset.S: Removed. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_chk_erms): New function.
* x86-64: Implement memmove family IFUNC selectors in CH.J. Lu2017-06-1420-474/+418
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement memmove family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memmove family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memcpy_chk-nonshared, mempcpy_chk-nonshared and memmove_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memmove_chk_erms, __memcpy_chk_erms and __mempcpy_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memmove.h: New file. * sysdeps/x86_64/multiarch/memcpy.c: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Removed. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_chk_erms): New function. (__memmove_chk_erms): Likewise. (__memcpy_chk_erms): New alias.
* Remove __need macros from errno.h (__need_Emath, __need_error_t).Zack Weinberg2017-06-143-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is fairly complicated, not because the users of __need_Emath and __need_error_t have complicated requirements, but because the core changes had a lot of fallout. __need_error_t exists for gnulib compatibility in argz.h and argp.h. error_t itself is a Hurdism, an enum containing all the E-constants, so you can do 'p (error_t) errno' in gdb and get a symbolic value. argz.h and argp.h use it for function return values, and they want to fall back to 'int' when that's not available. There is no reason why these nonstandard headers cannot just go ahead and include all of errno.h; so we do that. __need_Emath is defined only by .S files; what they _really_ need is for errno.h to avoid declaring anything other than the E-constants (e.g. 'extern int __errno_location(void);' is a syntax error in assembly language). This is replaced with a check for __ASSEMBLER__ in errno.h, plus a carefully documented requirement for bits/errno.h not to define anything other than macros. That in turn has the consequence that bits/errno.h must not define errno - fortunately, all live ports use the same definition of errno, so I've moved it to errno.h. The Hurd bits/errno.h must also take care not to define error_t when __ASSEMBLER__ is defined, which involves repeating all of the definitions twice, but it's a generated file so that's okay. * stdlib/errno.h: Remove __need_Emath and __need_error_t logic. Reorganize file. Declare errno here. When __ASSEMBLER__ is defined, don't declare anything other than the E-constants. * include/errno.h: Change conditional for exposing internal declarations to (not _ISOMAC and not __ASSEMBLER__). * bits/errno.h: Remove logic for __need_Emath. Document requirements for a port-specific bits/errno.h. * sysdeps/unix/sysv/linux/bits/errno.h * sysdeps/unix/sysv/linux/alpha/bits/errno.h * sysdeps/unix/sysv/linux/hppa/bits/errno.h * sysdeps/unix/sysv/linux/mips/bits/errno.h * sysdeps/unix/sysv/linux/sparc/bits/errno.h: Add multiple-include guard and check against improper inclusion. Remove __need_Emath logic. Don't declare errno here. Ensure all constants are defined as simple integer literals. Consistent formatting. * sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and enum __error_t_codes if __ASSEMBLER__ is not defined. * sysdeps/mach/hurd/bits/errno.h: Regenerate. * argp/argp.h, string/argz.h: Don't define __need_error_t before including errno.h. * sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S * sysdeps/x86_64/fpu/s_cosf.S * sysdeps/x86_64/fpu/s_sincosf.S * sysdeps/x86_64/fpu/s_sinf.S: Just include errno.h; don't define __need_Emath or include bits/errno.h directly.
* PowerPC64 ELFv2 PPC64_OPT_LOCALENTRYAlan Modra2017-06-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, that have no requirement on r2 or r12 and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. This patch implements the ld.so changes necessary for this optimization. ld.so needs to check that an optimized plt call sequence is in fact calling a function implemented with localentry:0, end emit a fatal error otherwise. The elf/testobj6.c change is to stop "error while loading shared libraries: expected localentry:0 `preload'" when running elf/preloadtest, which we'd get otherwise. * elf/elf.h (PPC64_OPT_LOCALENTRY): Define. * sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add refsym and sym parameters. Adjust callers. * sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New. (_dl_reloc_overflow): Increase buffser size. Formatting. * sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset): Delete reloc param, add refsym and sym. Check optimized plt call stubs for localentry:0 functions. Adjust callers. (elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym and sym parameters. Adjust callers. (_dl_reloc_overflow): Move attribute. (_dl_error_localentry): Declare. * elf/dl-runtime.c (_dl_fixup): Save original sym. Pass refsym and sym to elf_machine_fixup_plt. * elf/testobj6.c (preload): Call printf.
* x86-64: Implement strcpy family IFUNC selectors in CH.J. Lu2017-06-1214-131/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement strcpy family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcpy family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcpy-sse2 and stpcpy-sse2. * sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file. * sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/stpcpy.c: Likewise. * sysdeps/x86_64/multiarch/stpncpy.c: Likewise. * sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.c: Likewise. * sysdeps/x86_64/multiarch/strncpy.c: Likewise. * sysdeps/x86_64/multiarch/stpcpy.S: Removed. * sysdeps/x86_64/multiarch/stpncpy.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strncpy.S: Likewise. * sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New. (libc_hidden_def): Always defined as empty. * sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def): Always Defined as empty.
* x86-64: Correct comments in ifunc-impl-list.cH.J. Lu2017-06-091-6/+6
| | | | * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Correct comments.
* x86-64: Optimize strrchr/wcsrchr with AVX2H.J. Lu2017-06-098-0/+368
| | | | | | | | | | | | | | | | | | | | Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector instructions. It is as fast as SSE2 version for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strrchr_avx2, __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2. * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strrchr.c: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.