about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove configure tests for FMA4 support.Joseph Myers2015-10-0913-99/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -mfma4 in version 4.5. Thus the configure tests for this support are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_fma4): Remove configure test. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure.ac (libc_cv_cc_fma4): Remove configure test. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/fpu/multiarch/Makefile [$(have-mfma4) = yes]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/e_asin.c [HAVE_FMA4_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_atan2.c [HAVE_FMA4_SUPPORT]: Likewise. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_exp.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_log.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/e_pow.c [HAVE_FMA4_SUPPORT]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/s_atan.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_sin.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/s_tan.c [HAVE_FMA4_SUPPORT]: Make code unconditional. [!HAVE_FMA4_SUPPORT]: Remove conditional code. * config.h.in (HAVE_FMA4_SUPPORT): Remove #undef.
* Remove configure tests for AVX support.Joseph Myers2015-10-0814-190/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -mavx and -msse2avx in version 4.4. Thus the configure tests for this support are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_avx): Remove configure test. (libc_cv_cc_sse2avx): Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/i686/multiarch/Makefile [$(subdir)$(config-cflags-avx) = mathyes]: Change conditional to [$(subdir) = math]. * sysdeps/i386/i686/multiarch/s_fma-fma.c [HAVE_AVX_SUPPORT]: Make code unconditional. * sysdeps/i386/i686/multiarch/s_fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/s_fmaf-fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/configure.ac (libc_cv_cc_avx): Remove configure test. (libc_cv_cc_sse2avx): Likewise. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/Makefile [$(config-cflags-avx) = yes]: Make code unconditional. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile) [HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT]: Make code unconditional. (_dl_runtime_profile) [!(HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT)]: Remove conditional code. * sysdeps/x86_64/fpu/multiarch/Makefile [$(config-cflags-sse2avx) = yes]: Make code unconditional. * sysdeps/x86_64/fpu/multiarch/e_atan2.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c [HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise. * sysdeps/x86_64/multiarch/strcmp.S [HAVE_AVX_SUPPORT]: Likewise. * config.h.in (HAVE_AVX_SUPPORT): Remove #undef. (HAVE_SSE2AVX_SUPPORT): Likewise.
* Don't use dbl-64/wordsize-64 lround based on llround for ILP32 (bug 19079).Joseph Myers2015-10-071-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of lround in dbl-64/wordsize-64 as an alias or wrapper for llround is always incorrect when long is not 64-bit, because it misses required exceptions in overflow cases, as shown by my recently added tests. This patch removes that alias / wrapper in the non-LP64 case, together with the REGISTER_CAST_INT32_TO_INT64 macro, restoring the previous version of lround for dbl-64/wordsize-64 (newly conditioned on !_LP64). Tested for x86_64, and for mips64 with use of dbl-64/wordsize-64 enabled. [BZ #19079] * sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Restore previous file, conditioned on [!_LP64]. * sysdeps/ieee754/dbl-64/wordsize-64/s_llround.c [!_LP64] (__lround): Do not define as function or alias. [!_LP64] (lround): Likewise. [!_LP64] (__lroundl): Likewise. [!_LP64] (lroundl): Likewise. * sysdeps/tile/sysdep.h (REGISTER_CAST_INT32_TO_INT64): Remove macro. * sysdeps/x86_64/x32/sysdep.h (REGISTER_CAST_INT32_TO_INT64): Likewise.
* Remove configure tests for SSE4 support.Joseph Myers2015-10-065-49/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -msse4 in version 4.3. Thus the configure tests for it are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_sse4): Remove configure test. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/i686/multiarch/Makefile [$(config-cflags-sse4) = yes]: Make code unconditional. * sysdeps/i386/i686/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/x86_64/configure.ac (libc_cv_cc_sse4): Remove configure test. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/multiarch/Makefile [$(config-cflags-sse4) = yes]: Make code unconditional. * sysdeps/x86_64/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/x86_64/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise. * config.h.in (HAVE_SSE4_SUPPORT): Remove #undef.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated on Haswell.Paul Pluzhnikov2015-10-031-1/+1
|
* Improve test coverage of real libm functions [a-e]*.Joseph Myers2015-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves test coverage of the real libm functions [a-e]*, ensuring that special cases and ranges of input values of potential significance (such as close to overflow and underflow thresholds) are more systematically covered. This is a followup to <https://sourceware.org/ml/libc-alpha/2013-12/msg00757.html> which covered [a-c]* (however, I found more weaknesses in the coverage of those functions when preparing this patch, hence the additional tests being added for them here). Addition of a test for acosh (-qNaN) is temporarily deferred, to be included as part of a fix for bug 19032 which was discovered in the course of adding these tests (and which illustrates the use of testing -qNaN as well as +qNaN as input even to functions for which the sign of a NaN isn't meant to be significant). Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acos, acosh, asin, atan, atan2, atanh, cbrt, cos, cosh, erf, erfc, exp, exp10, exp2 and expm1. * math/auto-libm-test-out: Regenerated. * math/libm-test.inc (acos_test_data): Add more tests. (asin_test_data): Likewise. (asinh_test_data): Likewise. (atan_test_data): Likewise. (atanh_test_data): Likewise. (atan2_test_data): Likewise. (cbrt_test_data): Likewise. (ceil_test_data): Likewise. (copysign_test_data): Likewise. (cos_test_data): Likewise. (cosh_test_data): Likewise. (erf_test_data): Likewise. (erfc_test_data): Likewise. (exp_test_data): Likewise. (exp10_test_data): Likewise. (exp2_test_data): Likewise. (expm1_test_data): Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Fix clog, clog10 inaccuracy (bug 19016).Joseph Myers2015-09-281-40/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids cancellation error and then using log1p. However, the thresholds for using that approach still result in log being used on argument as large as sqrt(13/16) > 0.9, leading to significant errors, in some cases above the 9ulp maximum allowed in glibc libm. This patch arranges for the approach using log1p to be used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the existing allowance for cases where one of X and Y is very small), adjusting the __x2y2m1 functions to work with the wider range of inputs. This way, log only gets used on arguments below sqrt(1/2) (or substantially above 1), where the error involved is much less. Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration I removed the existing clog and clog10 ulps before regenerating to allow any reduced ulps to appear. Tests added include those found by random test generation to produce large ulps either before or after the patch, and some found by trying inputs close to the (0.75, 0.5) threshold where the potential errors from using log are largest. [BZ #19016] * sysdeps/generic/math_private.h (__x2y2m1f): Update comment to allow more cases with X^2 + Y^2 >= 0.5. * sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment. * sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0] (__x2y2m1): Update comment. * sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * math/s_clog.c (__clog): Handle more cases using log1p without hypot. * math/s_clog10.c (__clog10): Likewise. * math/s_clog10f.c (__clog10f): Likewise. * math/s_clog10l.c (__clog10l): Likewise. * math/s_clogf.c (__clogf): Likewise. * math/s_clogl.c (__clogl): Likewise. * math/auto-libm-test-in: Add more tests of clog and clog10. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Fix powf inaccuracy (bug 18956).Joseph Myers2015-09-261-8/+8
| | | | | | | | | | | | | | | | | | | | | | The flt-32 version of powf can be inaccurate because of bugs in the extra-precision calculation of (x-1)/(x+1) or (x-1.5)/(x+1.5) as part of calculating log(x) with extra precision: a constant used (as part of adding 1 or 1.5 through integer arithmetic) is incorrect, and then the code fails to mask a computed high part before using it in arithmetic that relies on s_h*t_h being exactly representable. This patch fixes these bugs. Tested for x86_64 and x86. x86_64 ulps for powf removed and regenerated to reflect reduced ulps from the increased accuracy for existing tests. [BZ #18956] * sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Add 0x00400000 not 0x0040000 for high bit of mantissa. Mask with 0xfffff000 when extracting high part. * math/auto-libm-test-in: Add another test of pow. * math/auto-libm-test-out: Regenerated. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Fix pow missing underflows (bug 18825).Joseph Myers2015-09-252-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to various other bugs in this area, pow functions can fail to raise the underflow exception when the result is tiny and inexact but one or more low bits of the intermediate result that is scaled down (or, in the i386 case, converted from a wider evaluation format) are zero. This patch forces the exception in a similar way to previous fixes, thereby concluding the fixes for known bugs with missing underflow exceptions currently filed in Bugzilla. Tested for x86_64, x86, mips64 and powerpc. [BZ #18825] * sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN): New macro. (DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise. (LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise. * sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN. (__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of DBL_NARROW_EVAL, reloading the PIC register as needed. * sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN. (__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of FLT_NARROW_EVAL. Use separate return path for case when first argument is NaN. * sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use DEFINE_LDBL_MIN. (__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the PIC register. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use math_check_force_underflow_nonneg. * sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force underflow for subnormal result. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use math_check_force_underflow_nonneg. * sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use math_check_force_underflow. * sysdeps/x86_64/fpu/x86_64-math-asm.h (LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro. * sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use DEFINE_LDBL_MIN. (__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN. * math/auto-libm-test-in: Add more tests of pow. * math/auto-libm-test-out: Regenerated.
* Refactor x86_64 libm code forcing underflow exceptions.Joseph Myers2015-09-243-30/+69
| | | | | | | | | | | | | | | | | | | | | | This patch refactors code in sysdeps/x86_64/fpu that forces underflow exceptions and closely follows corresponding i386 code to use common macros in x86_64-math-asm.h for that purpose. This is mainly about keeping the code similar to the i386 code as far as possible, since each macro apart from DEFINE_LDBL_MIN ends up used only once. It would be possible to do a further refactoring to share these macros between i386 and x86_64 (with i386 using the fcomip / fucomip versions when building for i686 and above), but I have no immediate plans to do so. Tested for x86_64. * sysdeps/x86_64/fpu/x86_64-math-asm.h: New file. * sysdeps/x86_64/fpu/e_exp2l.S: Include <x86_64-math-asm.h>. (ldbl_min): Replace with use of DEFINE_LDBL_MIN. (__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN. * sysdeps/x86_64/fpu/e_expl.S: Include <x86_64-math-asm.h>. [!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN. (IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
* Fix x86_64 fma4 pow inappropriate contraction (bug 19003).Joseph Myers2015-09-241-1/+1
| | | | | | | | | | | | | | | | The x86_64 fma4 version of pow fails to disable contraction of operations other than those explicitly intended to use fma instructions, so resulting in large ulps errors on processors with fma4 instructions, as in bug 18104 (165ulp for the test added for that bug; error originally reported by "blaaa" on #glibc). This patch adds $(config-cflags-nofma) for e_pow-fma4.c, corresponding to the use for e_pow.c in sysdeps/ieee754/dbl-64/Makefile. Tested for x86_64 on a processor with fma4. [BZ #19003] * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma4.c): Add $(config-cflags-nofma).
* Make scalbn set errno (bug 6803).Joseph Myers2015-09-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in bug 6803, scalbn fails to set errno on overflow and underflow. This patch fixes this by making scalbn an alias of ldexp, which has exactly the same semantics (for floating-point types with radix 2) and already has wrappers that deal with setting errno, instead of an alias of the internal __scalbn (which ldexp calls). Notes: * Where compat symbols were defined for scalbn functions, I didn't change what they point to (to keep the patch minimal), so such compat symbols continue to go directly to the non-errno-setting functions. * Mike, I didn't do anything with the IA64 versions of these functions, where I think both the ldexp and scalbn functions already deal with setting errno. As a cleanup (not needed to fix this bug) however you might want to make those functions into aliases for IA64; there is no need for them to be separate function implementations at all. * This concludes the fix for bug 6803 since the scalb and scalbln cases of that bug were fixed some time ago. Tested for x86_64, x86, mips64 and powerpc. [BZ #6803] * math/s_ldexp.c (scalbn): Define as weak alias of __ldexp. [NO_LONG_DOUBLE] (scalbnl): Define as weak alias of __ldexp. * math/s_ldexpf.c (scalbnf): Define as weak alias of __ldexpf. * math/s_ldexpl.c (scalbnl): Define as weak alias of __ldexpl. * sysdeps/i386/fpu/s_scalbn.S (scalbn): Remove alias. * sysdeps/i386/fpu/s_scalbnf.S (scalbnf): Likewise. * sysdeps/i386/fpu/s_scalbnl.S (scalbnl): Likewise. * sysdeps/ieee754/dbl-64/s_scalbn.c (scalbn): Likewise. [NO_LONG_DOUBLE] (scalbnl): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (scalbn): Likewise. [NO_LONG_DOUBLE] (scalbnl): Likewise. * sysdeps/ieee754/flt-32/s_scalbnf.c (scalbnf): Likewise. * sysdeps/ieee754/ldbl-128/s_scalbnl.c (scalbnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (scalbnl): Remove long_double_symbol calls. * sysdeps/ieee754/ldbl-64-128/s_scalbnl.c (scalbnl): Likewise. * sysdeps/ieee754/ldbl-opt/s_ldexpl.c (__ldexpl_2): Define as strong alias of __ldexpl. (scalbnl): Define using long_double_symbol. * sysdeps/m68k/m680x0/fpu/s_scalbn.c (__CONCATX(scalbn,suffix)): Remove alias. * sysdeps/sparc/sparc64/soft-fp/s_scalbnl.c (scalbnl): Likewise. * sysdeps/x86_64/fpu/s_scalbnl.S (scalbnl): Likewise. * math/libm-test.inc (scalbn_test_data): Add errno expectations. (scalbln_test_data): Add more errno expectations.
* Fix exp2 missing underflows (bug 16521).Joseph Myers2015-09-141-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various exp2 implementations in glibc can miss underflow exceptions when the scaling down part of the calculation is exact (or, in the x86 case, when the conversion from extended precision to the target precision is exact). This patch forces the exception in a similar way to previous fixes. The x86 exp2f changes may in fact not be needed for this purpose - it's likely to be the case that no argument of type float has an exp2 result so close to an exact subnormal float value that it equals that value when rounded to 64 bits (even taking account of variation between different x86 implementations). However, they are included for consistency with the changes to exp2 and so as to fix the exp2f part of bug 18875 by ensuring that excess range and precision is removed from underflowing return values. Tested for x86_64, x86 and mips64. [BZ #16521] [BZ #18875] * math/e_exp2l.c (__ieee754_exp2l): Force underflow exception for small results. * sysdeps/i386/fpu/e_exp2.S (dbl_min): New object. (MO): New macro. (__ieee754_exp2): For small results, force underflow exception and remove excess range and precision from return value. * sysdeps/i386/fpu/e_exp2f.S (flt_min): New object. (MO): New macro. (__ieee754_exp2f): For small results, force underflow exception and remove excess range and precision from return value. * sysdeps/i386/fpu/e_exp2l.S (ldbl_min): New object. (MO): New macro. (__ieee754_exp2l): Force underflow exception for small results. * sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise. * sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise. * sysdeps/x86_64/fpu/e_exp2l.S (ldbl_min): New object. (MO): New macro. (__ieee754_exp2l): Force underflow exception for small results. * math/auto-libm-test-in: Add more tests or exp2. * math/auto-libm-test-out: Regenerated.
* Add more random libm test inputs (mainly for ldbl-128).Joseph Myers2015-09-121-2/+2
| | | | | | | | | | | | | | | | | | | | | This patch adds more libm test inputs found through random test generation to increase previously known ulps. This particular test generation was run for mips64, so most of the increased ulps are for ldbl-128 (float and double having been fairly well covered by such testing for x86_64), but there's the odd ulps increase for other formats. Tested for x86_64, x86 and mips64. * math/auto-libm-test-in: Add more tests of acos, acosh, asin, asinh, atan, atan2, atanh, cabs, carg, cos, csqrt, erfc, exp, exp10, exp2, log, log1p, log2, pow, sin, sincos, sinh, tan and tanh. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/mips/mips32/libm-test-ulps: Likewise. * sysdeps/mips/mips64/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Move bits/atomic.h to atomic-machine.h (bug 14912).Joseph Myers2015-09-111-0/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noted in <https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the bits/*.h naming scheme should only be used for installed headers. This patch renames bits/atomic.h to atomic-machine.h to follow that convention. This is the only change in this series that needs to change the filename rather than simply removing a directory level (because both atomic.h and bits/atomic.h exist at present). Tested for x86_64 (testsuite, and that installed stripped shared libraries are unchanged by the patch). [BZ #14912] * sysdeps/aarch64/bits/atomic.h: Move to ... * sysdeps/aarch64/atomic-machine.h: ...here. (_AARCH64_BITS_ATOMIC_H): Rename macro to _AARCH64_ATOMIC_MACHINE_H. * sysdeps/alpha/bits/atomic.h: Move to ... * sysdeps/alpha/atomic-machine.h: ...here. * sysdeps/arm/bits/atomic.h: Move to ... * sysdeps/arm/atomic-machine.h: ...here. Update comments. * bits/atomic.h: Move to ... * sysdeps/generic/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/i386/bits/atomic.h: Move to ... * sysdeps/i386/atomic-machine.h: ...here. * sysdeps/ia64/bits/atomic.h: Move to ... * sysdeps/ia64/atomic-machine.h: ...here. * sysdeps/m68k/coldfire/bits/atomic.h: Move to ... * sysdeps/m68k/coldfire/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/m68k/m680x0/m68020/bits/atomic.h: Move to ... * sysdeps/m68k/m680x0/m68020/atomic-machine.h: ...here. * sysdeps/microblaze/bits/atomic.h: Move to ... * sysdeps/microblaze/atomic-machine.h: ...here. * sysdeps/mips/bits/atomic.h: Move to ... * sysdeps/mips/atomic-machine.h: ...here. (_MIPS_BITS_ATOMIC_H): Rename macro to _MIPS_ATOMIC_MACHINE_H. * sysdeps/powerpc/bits/atomic.h: Move to ... * sysdeps/powerpc/atomic-machine.h: ...here. Update comments. * sysdeps/powerpc/powerpc32/bits/atomic.h: Move to ... * sysdeps/powerpc/powerpc32/atomic-machine.h: ...here. Update comments. Include <atomic-machine.h> instead of <bits/atomic.h>. * sysdeps/powerpc/powerpc64/bits/atomic.h: Move to ... * sysdeps/powerpc/powerpc64/atomic-machine.h: ...here. Include <atomic-machine.h> instead of <bits/atomic.h>. * sysdeps/s390/bits/atomic.h: Move to ... * sysdeps/s390/atomic-machine.h: ...here. * sysdeps/sparc/sparc32/bits/atomic.h: Move to ... * sysdeps/sparc/sparc32/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/sparc/sparc32/sparcv9/bits/atomic.h: Move to ... * sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: ...here. * sysdeps/sparc/sparc64/bits/atomic.h: Move to ... * sysdeps/sparc/sparc64/atomic-machine.h: ...here. * sysdeps/tile/bits/atomic.h: Move to ... * sysdeps/tile/atomic-machine.h: ...here. * sysdeps/tile/tilegx/bits/atomic.h: Move to ... * sysdeps/tile/tilegx/atomic-machine.h: ...here. Include <sysdeps/tile/atomic-machine.h> instead of <sysdeps/tile/bits/atomic.h>. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/tile/tilepro/bits/atomic.h: Move to ... * sysdeps/tile/tilepro/atomic-machine.h: ...here. Include <sysdeps/tile/atomic-machine.h> instead of <sysdeps/tile/bits/atomic.h>. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/arm/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/arm/atomic-machine.h: ...here. Include <sysdeps/arm/atomic-machine.h> instead of <sysdeps/arm/bits/atomic.h>. * sysdeps/unix/sysv/linux/hppa/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/hppa/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/m68k/coldfire/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: ...here. (_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/nios2/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/nios2/atomic-machine.h: ...here. (_NIOS2_BITS_ATOMIC_H): Rename macro to _NIOS2_ATOMIC_MACHINE_H. * sysdeps/unix/sysv/linux/sh/bits/atomic.h: Move to ... * sysdeps/unix/sysv/linux/sh/atomic-machine.h: ...here. * sysdeps/x86_64/bits/atomic.h: Move to ... * sysdeps/x86_64/atomic-machine.h: ...here. * include/atomic.h: Include <atomic-machine.h> instead of <bits/atomic.h>.
* Add more randomly-generated libm tests.Joseph Myers2015-09-111-33/+33
| | | | | | | | | | | | | This patch adds more libm test inputs found through random test generation to increase observed ulps on x86_64. Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acosh, atanh, cbrt, cosh, csqrt, erfc, expm1 and lgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Fix lgamma (negative) inaccuracy (bug 2542, bug 2543, bug 2558).Joseph Myers2015-09-101-48/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing implementations of lgamma functions (except for the ia64 versions) use the reflection formula for negative arguments. This suffers large inaccuracy from cancellation near zeros of lgamma (near where the gamma function is +/- 1). This patch fixes this inaccuracy. For arguments above -2, there are no zeros and no large cancellation, while for sufficiently large negative arguments the zeros are so close to integers that even for integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation is not significant. Thus, it is only necessary to take special care about cancellation for arguments around a limited number of zeros. Accordingly, this patch uses precomputed tables of relevant zeros, expressed as the sum of two floating-point values. The log of the ratio of two sines can be computed accurately using log1p in cases where log would lose accuracy. The log of the ratio of two gamma(1-x) values can be computed using Stirling's approximation (the difference between two values of that approximation to lgamma being computable without computing the two values and then subtracting), with appropriate adjustments (which don't reduce accuracy too much) in cases where 1-x is too small to use Stirling's approximation directly. In the interval from -3 to -2, using the ratios of sines and of gamma(1-x) can still produce too much cancellation between those two parts of the computation (and that interval is also the worst interval for computing the ratio between gamma(1-x) values, which computation becomes more accurate, while being less critical for the final result, for larger 1-x). Because this can result in errors slightly above those accepted in glibc, this interval is instead dealt with by polynomial approximations. Separate polynomial approximations to (|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8 from -3 to -2, where n (-3 or -2) is the nearest integer to the 1/8-interval and x0 is the zero of lgamma in the relevant half-integer interval (-3 to -2.5 or -2.5 to -2). Together, the two approaches are intended to give sufficient accuracy for all negative arguments in the problem range. Outside that range, the previous implementation continues to be used. Tested for x86_64, x86, mips64 and powerpc. The mips64 and powerpc testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm with large negative arguments giving spurious "invalid" exceptions (exposed by newly added tests for cases this patch doesn't affect the logic for); I'll address those problems separately. [BZ #2542] [BZ #2543] [BZ #2558] * sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call __lgamma_neg for arguments from -28.0 to -2.0. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call __lgamma_negf for arguments from -15.0 to -2.0. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r): Call __lgamma_negl for arguments from -33.0 to -2.0. * sysdeps/ieee754/dbl-64/lgamma_neg.c: New file. * sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise. * sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise. * sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise. * sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise. * sysdeps/generic/math_private.h (__lgamma_negf): New prototype. (__lgamma_neg): Likewise. (__lgamma_negl): Likewise. (__lgamma_product): Likewise. (__lgamma_productl): Likewise. * math/Makefile (libm-calls): Add lgamma_neg and lgamma_product. * math/auto-libm-test-in: Add more tests of lgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Move bits/libc-lock.h and bits/libc-lockP.h out of bits/ (bug 14912).Joseph Myers2015-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noted in <https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the bits/*.h naming scheme should only be used for installed headers. This patch renames bits/libc-lock.h to plain libc-lock.h and bits/libc-lockP.h to plain libc-lockP.h to follow that convention. Note that I don't know where libc-lockP.h comes from for Hurd (the Hurd libc-lock.h includes libc-lockP.h, but the only libc-lockP.h in the glibc source tree is for NPTL) - some unmerged patch? - but I updated the #include in the Hurd libc-lock.h anyway. Tested for x86_64 (testsuite, and that installed stripped shared libraries are unchanged by the patch). [BZ #14912] * bits/libc-lock.h: Move to ... * sysdeps/generic/libc-lock.h: ...here. (_BITS_LIBC_LOCK_H): Rename macro to _LIBC_LOCK_H. * sysdeps/mach/hurd/bits/libc-lock.h: Move to ... * sysdeps/mach/hurd/libc-lock.h: ...here. (_BITS_LIBC_LOCK_H): Rename macro to _LIBC_LOCK_H. [_LIBC]: Include <libc-lockP.h> instead of <bits/libc-lockP.h>. * sysdeps/mach/bits/libc-lock.h: Move to ... * sysdeps/mach/libc-lock.h: ...here. (_BITS_LIBC_LOCK_H): Rename macro to _LIBC_LOCK_H. * sysdeps/nptl/bits/libc-lock.h: Move to ... * sysdeps/nptl/libc-lock.h: ...here. (_BITS_LIBC_LOCK_H): Rename macro to _LIBC_LOCK_H. * sysdeps/nptl/bits/libc-lockP.h: Move to ... * sysdeps/nptl/libc-lockP.h: ...here. (_BITS_LIBC_LOCKP_H): Rename macro to _LIBC_LOCKP_H. * crypt/crypt_util.c: Include <libc-lock.h> instead of <bits/libc-lock.h>. * dirent/scandir-tail.c: Likewise. * dlfcn/dlerror.c: Likewise. * elf/dl-close.c: Likewise. * elf/dl-iteratephdr.c: Likewise. * elf/dl-lookup.c: Likewise. * elf/dl-open.c: Likewise. * elf/dl-support.c: Likewise. * elf/dl-writev.h: Likewise. * elf/rtld.c: Likewise. * grp/fgetgrent.c: Likewise. * gshadow/fgetsgent.c: Likewise. * gshadow/sgetsgent.c: Likewise. * iconv/gconv_conf.c: Likewise. * iconv/gconv_db.c: Likewise. * iconv/gconv_dl.c: Likewise. * iconv/gconv_int.h: Likewise. * iconv/gconv_trans.c: Likewise. * include/link.h: Likewise. * inet/getnameinfo.c: Likewise. * inet/getnetgrent.c: Likewise. * inet/getnetgrent_r.c: Likewise. * intl/bindtextdom.c: Likewise. * intl/dcigettext.c: Likewise. * intl/finddomain.c: Likewise. * intl/gettextP.h: Likewise. * intl/loadmsgcat.c: Likewise. * intl/localealias.c: Likewise. * intl/textdomain.c: Likewise. * libidn/idn-stub.c: Likewise. * libio/libioP.h: Likewise. * locale/duplocale.c: Likewise. * locale/freelocale.c: Likewise. * locale/newlocale.c: Likewise. * locale/setlocale.c: Likewise. * login/getutent_r.c: Likewise. * login/getutid_r.c: Likewise. * login/getutline_r.c: Likewise. * login/utmp-private.h: Likewise. * login/utmpname.c: Likewise. * malloc/mtrace.c: Likewise. * misc/efgcvt.c: Likewise. * misc/error.c: Likewise. * misc/fstab.c: Likewise. * misc/getpass.c: Likewise. * misc/mntent.c: Likewise. * misc/syslog.c: Likewise. * nis/nis_call.c: Likewise. * nis/nis_callback.c: Likewise. * nis/nss-default.c: Likewise. * nis/nss_compat/compat-grp.c: Likewise. * nis/nss_compat/compat-initgroups.c: Likewise. * nis/nss_compat/compat-pwd.c: Likewise. * nis/nss_compat/compat-spwd.c: Likewise. * nis/nss_nis/nis-alias.c: Likewise. * nis/nss_nis/nis-ethers.c: Likewise. * nis/nss_nis/nis-grp.c: Likewise. * nis/nss_nis/nis-hosts.c: Likewise. * nis/nss_nis/nis-network.c: Likewise. * nis/nss_nis/nis-proto.c: Likewise. * nis/nss_nis/nis-pwd.c: Likewise. * nis/nss_nis/nis-rpc.c: Likewise. * nis/nss_nis/nis-service.c: Likewise. * nis/nss_nis/nis-spwd.c: Likewise. * nis/nss_nisplus/nisplus-alias.c: Likewise. * nis/nss_nisplus/nisplus-ethers.c: Likewise. * nis/nss_nisplus/nisplus-grp.c: Likewise. * nis/nss_nisplus/nisplus-hosts.c: Likewise. * nis/nss_nisplus/nisplus-initgroups.c: Likewise. * nis/nss_nisplus/nisplus-network.c: Likewise. * nis/nss_nisplus/nisplus-proto.c: Likewise. * nis/nss_nisplus/nisplus-pwd.c: Likewise. * nis/nss_nisplus/nisplus-rpc.c: Likewise. * nis/nss_nisplus/nisplus-service.c: Likewise. * nis/nss_nisplus/nisplus-spwd.c: Likewise. * nis/ypclnt.c: Likewise. * nptl/libc_pthread_init.c: Likewise. * nss/getXXbyYY.c: Likewise. * nss/getXXent.c: Likewise. * nss/getXXent_r.c: Likewise. * nss/nss_db/db-XXX.c: Likewise. * nss/nss_db/db-netgrp.c: Likewise. * nss/nss_db/nss_db.h: Likewise. * nss/nss_files/files-XXX.c: Likewise. * nss/nss_files/files-alias.c: Likewise. * nss/nsswitch.c: Likewise. * posix/regex_internal.h: Likewise. * posix/wordexp.c: Likewise. * pwd/fgetpwent.c: Likewise. * resolv/res_hconf.c: Likewise. * resolv/res_libc.c: Likewise. * shadow/fgetspent.c: Likewise. * shadow/lckpwdf.c: Likewise. * shadow/sgetspent.c: Likewise. * socket/opensock.c: Likewise. * stdio-common/reg-modifier.c: Likewise. * stdio-common/reg-printf.c: Likewise. * stdio-common/reg-type.c: Likewise. * stdio-common/vfprintf.c: Likewise. * stdio-common/vfscanf.c: Likewise. * stdlib/abort.c: Likewise. * stdlib/cxa_atexit.c: Likewise. * stdlib/fmtmsg.c: Likewise. * stdlib/random.c: Likewise. * stdlib/setenv.c: Likewise. * string/strsignal.c: Likewise. * sunrpc/auth_none.c: Likewise. * sunrpc/bindrsvprt.c: Likewise. * sunrpc/create_xid.c: Likewise. * sunrpc/key_call.c: Likewise. * sunrpc/rpc_thread.c: Likewise. * sysdeps/arm/backtrace.c: Likewise. * sysdeps/generic/ldsodefs.h: Likewise. * sysdeps/generic/stdio-lock.h: Likewise. * sysdeps/generic/unwind-dw2-fde.c: Likewise. * sysdeps/i386/backtrace.c: Likewise. * sysdeps/ieee754/ldbl-opt/nldbl-compat.c: Likewise. * sysdeps/m68k/backtrace.c: Likewise. * sysdeps/mach/hurd/cthreads.c: Likewise. * sysdeps/mach/hurd/dirstream.h: Likewise. * sysdeps/mach/hurd/malloc-machine.h: Likewise. * sysdeps/nptl/malloc-machine.h: Likewise. * sysdeps/nptl/stdio-lock.h: Likewise. * sysdeps/posix/dirstream.h: Likewise. * sysdeps/posix/getaddrinfo.c: Likewise. * sysdeps/posix/system.c: Likewise. * sysdeps/pthread/aio_suspend.c: Likewise. * sysdeps/s390/s390-32/backtrace.c: Likewise. * sysdeps/s390/s390-64/backtrace.c: Likewise. * sysdeps/unix/sysv/linux/check_pf.c: Likewise. * sysdeps/unix/sysv/linux/if_index.c: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/getutent_r.c: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/getutid_r.c: Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/getutline_r.c: Likewise. * sysdeps/unix/sysv/linux/shm-directory.c: Likewise. * sysdeps/unix/sysv/linux/system.c: Likewise. * sysdeps/x86_64/backtrace.c: Likewise. * time/alt_digit.c: Likewise. * time/era.c: Likewise. * time/tzset.c: Likewise. * wcsmbs/wcsmbsload.c: Likewise. * nptl/tst-initializers1.c (do_test): Refer to <libc-lock.h> instead of <bits/libc-lock.h> in comment.
* Don't disable SSE in x86-64 ld.soH.J. Lu2015-08-261-0/+4
| | | | | | | | | | | | | | | | Since x86-64 ld.so preserves vector registers now, we can use SSE in x86-64 ld.so. We should run tst-ld-sse-use.sh only on i386. * sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os, tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ... * sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os, tests-special, $(objpfx)tst-ld-sse-use.out): Here. Update comments. * sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add -mno-mmx for $(all-rtld-routines). * sysdeps/x86/tst-ld-sse-use.sh: Moved to ... * sysdeps/i386/tst-ld-sse-use.sh: Here. Replace x86-64 with i386.
* Use SSE2 optimized strcmp in x86-64 ld.soH.J. Lu2015-08-251-253/+216
| | | | | | | Since ld.so preserves vector registers now, we can use the same SSE2 optimized strcmp in x86-64 libc and ld.so. * sysdeps/x86_64/strcmp.S: Remove "#if !IS_IN (libc)".
* Replace %xmm[8-12] with %xmm[0-4]H.J. Lu2015-08-251-47/+47
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm[0-4] to avoid the REX prefix. * sysdeps/x86_64/strlen.S: Replace %xmm[8-12] with %xmm[0-4].
* Remove x86-64 rtld-xxx.c and rtld-xxx.SH.J. Lu2015-08-256-464/+0
| | | | | | | | | | | | Since ld.so preserves vector registers now, we can use the regular, non-ifunc string and memory functions in ld.so. * sysdeps/x86_64/rtld-memcmp.c: Removed. * sysdeps/x86_64/rtld-memset.S: Likewise. * sysdeps/x86_64/rtld-strchr.S: Likewise. * sysdeps/x86_64/rtld-strlen.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memcmp.c: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
* Replace %xmm8 with %xmm0H.J. Lu2015-08-251-26/+26
| | | | | | | Since ld.so preserves vector registers now, we can use %xmm0 to avoid the REX prefix. * sysdeps/x86_64/memset.S: Replace %xmm8 with %xmm0.
* Fix strcpy_chk and stpcpy_chk performance.Ondřej Bílka2015-08-252-211/+0
| | | | | | | | | | | | | | | Hi, as I wrote in previous patches a performance of checked strcpy and stpcpy is terrible as these don't use sse2 and are around four times slower that strcpy and stpcpy now. As this bug shows that these functions are not performance sensitive I decided just to improve generic implementation instead for easier maintainance. * debug/strcpy_chk.c: Improve performance. * debug/stpcpy_chk.c: Likewise. * sysdeps/x86_64/strcpy_chk.S: Remove. * sysdeps/x86_64/stpcpy_chk.S: Remove.
* Save and restore vector registers in x86-64 ld.soH.J. Lu2015-08-258-501/+472
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve and _dl_runtime_profile, which save and restore the first 8 vector registers used for parameter passing. elf_machine_runtime_setup selects the proper _dl_runtime_resolve or _dl_runtime_profile based on _dl_x86_cpu_features. It avoids race condition caused by FOREIGN_CALL macros, which are only used for x86-64. Performance impact of saving and restoring 8 vector registers are negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when ld.so is optimized with SSE2. [BZ #15128] * sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add ifuncmain8. (modules-names): Add ifuncmod8. ($(objpfx)ifuncmain8): New rule. * sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and <cpuid.h>. (elf_machine_runtime_setup): Use _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512, _dl_runtime_profile_sse, _dl_runtime_profile_avx, or _dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE. * sysdeps/x86_64/dl-trampoline.S: Rewrite. * sysdeps/x86_64/dl-trampoline.h: Likewise. * sysdeps/x86_64/ifuncmain8.c: New file. * sysdeps/x86_64/ifuncmod8.c: Likewise. * sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE): Removed. * sysdeps/x86_64/nptl/tls.h (__128bits): Removed. (tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1. Change rtld_savespace_sse to __glibc_unused2. (RTLD_CHECK_FOREIGN_CALL): Removed. (RTLD_ENABLE_FOREIGN_CALL): Likewise. (RTLD_PREPARE_FOREIGN_CALL): Likewise. (RTLD_FINALIZE_FOREIGN_CALL): Likewise.
* Remove the unused IFUNC filesH.J. Lu2015-08-202-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysdeps/i386/i686/multiarch/strcasestr-c.c became unused after commit 1818483b15d22016b0eae41d37ee91cc87b37510 Author: Andreas Schwab <schwab@suse.de> Date: Wed Dec 18 11:53:27 2013 +1000 Remove use of SSE4.2 functions for strstr on i686 which contains -sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c +sysdep_routines += strcspn-c strpbrk-c strspn-c sysdeps/x86_64/multiarch/strcasestr.c became useless after t 584b18eb4df61ccd447db2dfe8c8a7901f8c8598 Author: Ondřej Bílka <neleai@seznam.cz> Date: Sat Dec 14 19:33:56 2013 +0100 Add strstr with unaligned loads. Fixes bug 12100. which changes sysdeps/x86_64/multiarch/strcasestr.c to libc_ifunc (__strcasestr, __strcasestr_sse2); This patch removes these file. * i386/i686/multiarch/strcasestr-c.c: Removed. * x86_64/multiarch/strcasestr.c: Likewise. * x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove strcasestr.
* Move x86_64 init-arch.h to sysdeps/x86/init-arch.hH.J. Lu2015-08-202-23/+1
| | | | | | | | | | | | Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h which can be used for both i386 and x86_64. * sysdeps/i386/i686/multiarch/init-arch.h: Removed. * sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise. * sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead of "multiarch/init-arch.h". * sysdeps/x86_64/multiarch/init-arch.h: Renamed to ... * sysdeps/x86/init-arch.h: This.
* Fix dynamic linker issue with bind-nowPetar Jovanovic2015-08-193-0/+38
| | | | | | | | | | | | | | Fix the bind-now case when DT_REL and DT_JMPREL sections are separate and there is a gap between them. [BZ #14341] * elf/dynamic-link.h (elf_machine_lazy_rel): Properly handle the case when there is a gap between DT_REL and DT_JMPREL sections. * sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc. (LDFLAGS-tst-split-dynreloc): New. (tst-split-dynreloc-ENV): Likewise. * sysdeps/x86_64/tst-split-dynreloc.c: New file. * sysdeps/x86_64/tst-split-dynreloc.lds: Likewise.
* Fix BZ #18084 -- backtrace (..., 0) dumps core on x86.Paul Pluzhnikov2015-08-151-2/+5
| | | | Other architectures also had bugs, or did unnecessary work.
* Regenerated sysdeps/x86_64/fpu/libm-test-ulps with AVX2.Paul Pluzhnikov2015-08-141-1/+1
|
* Remove incorrect register mov in floorf/nearbyint on x86_64Siddhesh Poyarekar2015-08-142-2/+0
| | | | | | | | | | | | | The change in 0b5395f052ee09cd7e3d219af4e805c38058afb5 replaced calls to __get_cpu_features@plt followed by a mov from rax to rdx, with a single macro LOAD_RTLD_GLOBAL_RO_RDX. It is pretty clear that there was a typo in s_floorf and __nearbyint due to which the (now incorrect) mov was not removed. This patch removes that mov. * sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf): Remove unnecessary movq. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S (__nearbyint): Likewise.
* Add more random libm-test inputs.Joseph Myers2015-08-131-62/+64
| | | | | | | | | | | | | | | | This patch adds more test inputs to various libm functions found through random generation to have larger ulps errors than previously listed in libm-test-ulp, on at least one of x86_64 and x86. Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acos, acosh, asin, asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc, exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh and tgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Update libmvec multiarch functions for <cpu-features.h>H.J. Lu2015-08-1338-227/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates libmvec multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * math/Makefile ($(addprefix $(objpfx), $(libm-vec-tests))): Remove $(objpfx)init-arch.o. * sysdeps/x86_64/fpu/Makefile (libmvec-support): Remove init-arch. * sysdeps/x86_64/fpu/math-tests-arch.h (avx_usable): Removed. (INIT_ARCH_EXT): Defined as empty. (CHECK_ARCH_EXT): Replace HAS_XXX with HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: Remove __init_cpu_features call. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: Likewise.
* Update x86_64 multiarch functions for <cpu-features.h>H.J. Lu2015-08-1339-233/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates x86_64 multiarch functions to use the newly defined HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from <cpu-features.h>. * sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1). * sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise. * sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise. * sysdeps/x86_64/multiarch/strstr.c: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/test-multiarch.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX). * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/wcscpy.S: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-1310-546/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
* Add more tests of various libm functions.Joseph Myers2015-08-111-12/+12
| | | | | | | | | | | | | | This patch adds more tests of various libm functions found through random test generation to give increased ulps on 32-bit x86. Tested for x86_64 and x86. * math/auto-libm-test-in: Add more tests of acosh, asin, asinh, atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10, expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Align stack to 16 bytes when calling __errno_locationH.J. Lu2015-08-053-0/+18
| | | | | | | | | | We should align stack to 16 bytes when calling __errno_location. [BZ #18661] * sysdeps/x86_64/fpu/s_cosf.S (__cosf): Align stack to 16 bytes when calling __errno_location. * sysdeps/x86_64/fpu/s_sincosf.S (__sincosf): Likewise. * sysdeps/x86_64/fpu/s_sinf.S (__sinf): Likewise.
* Compile {memcpy,strcmp}-sse2-unaligned.S only for libcH.J. Lu2015-08-052-0/+8
| | | | | | | | {memcpy,strcmp}-sse2-unaligned.S aren't needed in ld.so. * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Compile only for libc. * sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Likewise.
* Prevent runtime fail of SSE vector math tests on non SSE4.1 machine.Andrew Senkevich2015-07-301-2/+0
| | | | | | | | [BZ #18740] * sysdeps/x86_64/fpu/Makefile (double-vlen2-arch-ext-cflags, float-vlen4-arch-ext-cflags): Removed. * math/Makefile (CFLAGS-test-double-vlen2-wrappers.c, CFLAGS-test-float-vlen4-wrappers.c): Likewise.
* Extend local PLT reference checkH.J. Lu2015-07-291-0/+19
| | | | | | | | | | | | | | On x86, linker in binutils 2.26 and newer consolidates R_*_JUMP_SLOT with R_*_GLOB_DAT relocation against the same symbol. This patch extends local PLT reference check to support alternate relocations. [BZ #18078] * scripts/check-localplt.awk: Support alternate relocations. * scripts/localplt.awk: Also check relocations in DT_RELA/DT_REL sections. * sysdeps/unix/sysv/linux/i386/localplt.data: Mark free and malloc entries with + REL R_386_GLOB_DAT. * sysdeps/x86_64/localplt.data: New file.
* Added runtime check for AVX vector math tests.Andrew Senkevich2015-07-293-1/+27
| | | | | | | [BZ #18731] * sysdeps/x86_64/fpu/math-tests-arch.h: Added AVX runtime check. * sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
* Fixed several libmvec bugs found during testing on KNL hardware.Andrew Senkevich2015-07-2415-223/+201
| | | | | | | | | | | | | | | | | | | | | | AVX512 IFUNC implementations, implementations of wrappers to AVX2 versions and KNL expf implementation fixed. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: Fixed AVX512 IFUNC. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Fixed wrappers to AVX2. * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Fixed KNL implementation.
* Modify several tests to use test-skeleton.cArjun Shankar2015-07-151-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests were skipped by the use-test-skeleton conversion done in commit 29955b5d because they were reused in other tests via the #include directive, and so deemed worth an inspection before they were modified. This has now been done. ChangeLog: 2015-07-09 Arjun Shankar <arjun.is@lostca.se> * elf/tst-leaks1.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * localedata/tst-langinfo.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-fpucw.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-tgmath.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * math/test-tgmath2.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * setjmp/tst-setjmp.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * stdio-common/tst-sscanf.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * sysdeps/x86_64/tst-audit6.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c.
* Improve bndmov encoding with zero displacementH.J. Lu2015-07-091-0/+8
| | | | | | | | | | If x86-64 assembler doesn't support MPX, we encode bndmov instruction by hand. When displacement is zero, assembler generates shorter encoding. This patch improves bndmov encoding with zero displacement so that ld.so is identical when using assemblers with and without MPX support. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve): Improve bndmov encoding with zero displacement.
* Preserve bound registers for pointer pass/returnIgor Zamyatin2015-07-092-20/+25
| | | | | | | | | | | | | | | | | | | | | | | We need to save/restore bound registers and add a BND prefix before branches in _dl_runtime_profile so that bound registers for pointer pass and return are preserved when LD_AUDIT is used. [BZ #18134] * sysdeps/i386/configure.ac: Set HAVE_MPX_SUPPORT. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New. (_dl_runtime_profile): Save and restore Intel MPX return bound registers when calling _dl_call_pltexit. Add PRESERVE_BND_REGS_PREFIX before return. * sysdeps/i386/link-defines.sym (LRV_BND0_OFFSET): New. (LRV_BND1_OFFSET): Likewise. * sysdeps/x86/bits/link.h (La_i86_retval): Add lrv_bnd0 and lrv_bnd1. * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix typo in bndmov encoding. * sysdeps/x86_64/dl-trampoline.h: Properly save and restore Intel MPX bound registers. Add PRESERVE_BND_REGS_PREFIX before branch instructions to preserve bounds.
* Add la_symbind32 to x86-64 audit testsH.J. Lu2015-07-076-0/+60
| | | | | | | | | | | | la_symbind32 is used for x32 in x86-64 audit tests. We should define both la_symbind32 and la_symbind64 in x86-64 audit tests. * sysdeps/x86_64/tst-auditmod10b.c (la_symbind32): New. * sysdeps/x86_64/tst-auditmod4b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod5b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod6b.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod6c.c (la_symbind32): Likewise. * sysdeps/x86_64/tst-auditmod7b.c (la_symbind32): Likewise.
* Clean up BUSY_WAIT_NOP and atomic_delay.Torvald Riegel2015-06-301-1/+1
| | | | | | | | | This patch combines BUSY_WAIT_NOP and atomic_delay into a new atomic_spin_nop function and adjusts all clients. The new function is put into atomic.h because what is best done in a spin loop is architecture-specific, and atomics must be used for spinning. The function name is meant to tell users that this has no effect on synchronization semantics but is a performance aid for spinning.
* Improve tgamma accuracy (bug 18613).Joseph Myers2015-06-291-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In non-default rounding modes, tgamma can be slightly less accurate than permitted by glibc's accuracy goals. Part of the problem is error accumulation, addressed in this patch by setting round-to-nearest for internal computations. However, there was also a bug in the code dealing with computing pow (x + n, x + n) where x + n is not exactly representable, providing another source of error even in round-to-nearest mode; it was necessary to address both bugs to get errors for all testcases within glibc's accuracy goals. Given this second fix, accuracy in round-to-nearest mode is also improved (hence regeneration of ulps for tgamma should be from scratch - truncate libm-test-ulps or at least remove existing tgamma entries - so that the expected ulps can be reduced). Some additional complications also arose. Certain tgamma tests should strictly, according to IEEE semantics, overflow or not depending on the rounding mode; this is beyond the scope of glibc's accuracy goals for any function without exactly-determined results, but gen-auto-libm-tests doesn't handle being lax there as it does for underflow. (libm-test.inc also doesn't handle being lax about whether the result in cases very close to the overflow threshold is infinity or a finite value close to overflow, but that doesn't cause problems in this case though I've seen it cause problems with random test generation for some functions.) Thus, spurious-overflow markings, with a comment, are added to auto-libm-test-in (no bug in Bugzilla because the issue is with the testsuite, not a user-visible bug in glibc). And on x86, after the patch I saw ERANGE issues as previously reported by Carlos (see my commentary in <https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which needed addressing by ensuring excess range and precision were eliminated at various points if FLT_EVAL_METHOD != 0. I also noticed and fixed a cosmetic issue where 1.0f was used in long double functions and should have been 1.0L. This completes the move of all functions to testing in all rounding modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to remove the workaround for some functions not using ALL_RM_TEST. Tested for x86_64, x86, mips64 and powerpc. [BZ #18613] * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gamma_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammaf_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log of X_ADJ not X when adjusting exponent. (__ieee754_gammal_r): Do intermediate computations in round-to-nearest then adjust overflowing and underflowing results as needed. Use 1.0L not 1.0f as numerator of division. * math/libm-test.inc (tgamma_test_data): Remove one test. Moved to auto-libm-test-in. (tgamma_test): Use ALL_RM_TEST. * math/auto-libm-test-in: Add one test of tgamma. Mark some other tests of tgamma with spurious-overflow. * math/auto-libm-test-out: Regenerated. * math/gen-libm-have-vector-test.sh: Do not check for START. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Use round-to-nearest internally in jn, test with ALL_RM_TEST (bug 18602).Joseph Myers2015-06-251-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some existing jn tests, if run in non-default rounding modes, produce errors above those accepted in glibc, which causes problems for moving tests of jn to use ALL_RM_TEST. This patch makes jn set rounding to-nearest internally, as was done for yn some time ago, then computes the appropriate underflowing value for results that underflowed to zero in to-nearest, and moves the tests to ALL_RM_TEST. It does nothing about the general inaccuracy of Bessel function implementations in glibc, though it should make jn more accurate on average in non-default rounding modes through reduced error accumulation. The recomputation of results that underflowed to zero should as a side-effect fix some cases of bug 16559, where jn just used an exact zero, but that is *not* the goal of this patch and other cases of that bug remain unfixed. (Most of the changes in the patch are reindentation to add new scopes for SET_RESTORE_ROUND*.) Tested for x86_64, x86, powerpc and mips64. [BZ #16559] [BZ #18602] * sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set round-to-nearest internally then recompute results that underflowed to zero in the original rounding mode. * sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise * math/libm-test.inc (jn_test): Use ALL_RM_TEST. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Refactor libm tests.Joseph Myers2015-06-248-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the libm tests using libm-test.inc to reduce the level of duplicate definitions. New headers are created for the definitions shared by tests for a particular type; by tests of inline functions; by tests of non-inline functions; by scalar tests; and by vector tests. The unused MATHCONST macro is removed. A new macro VEC_LEN is added to the vector headers to allow the macros defining wrappers for vector functions to be defined once, instead of six times each (differing only in vector length) as before. There is still scope for further refactoring, but this seems a useful start. Tested for x86_64. * math/test-double.h: New file. * math/test-float.h: Likewise. * math/test-ldouble.h: Likewise. * math/test-math-inline.h: Likewise. * math/test-math-no-inline.h: Likewise. * math/test-math-scalar.h: Likewise. * math/test-math-vector.h: Likewise. * math/test-vec-loop.h: Remove file. Contents moved into test-math-vector.h. * math/libm-test.inc (MATHCONST): Do not document macro. * math/test-double.c: Include test-double.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-float.c: Include test-float.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-idouble.c: Include test-double.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ifloat.c: Include test-float.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ildoubl.c: Include test-ldouble.h, test-math-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_LDOUBLE): Likewise. (TEST_MATHVEC): Likewise. (TEST_INLINE): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-ldouble.c: Include test-ldouble.h, test-math-no-inline.h and test-math-scalar.h. (FUNC): Remove macro. (FUNC_TEST): Likewise. (FLOAT): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_LDOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. * math/test-double-vlen2.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-double-vlen4.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-double-vlen8.h: Include test-double.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_DOUBLE): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen4.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen8.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * math/test-float-vlen16.h: Include test-float.h, test-math-no-inline.h and test-math-vector.h. (FLOAT): Remove macro. (FUNC): Likewise. (MATHCONST): Likewise. (PRINTF_EXPR): Likewise. (PRINTF_XEXPR): Likewise. (PRINTF_NEXPR): Likewise. (TEST_FLOAT): Likewise. (TEST_MATHVEC): Likewise. (__NO_MATH_INLINES): Likewise. (CNCT): Likewise. (CONCAT): Likewise. (WRAPPER_NAME): Likewise. (WRAPPER_DECL): Likewise. (WRAPPER_DECL_ff): Likewise. (WRAPPER_DECL_fFF): Likewise. (VECTOR_WRAPPER): Likewise. (VECTOR_WRAPPER_ff): Likewise. (VECTOR_WRAPPER_fFF): Likewise. (VEC_LEN): New macro. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Do not include test-vec-loop.h. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.