about summary refs log tree commit diff
path: root/sysdeps/x86_64/fpu/libm-test-ulps
Commit message (Collapse)AuthorAgeFilesLines
* math: Add more inputs to atan2 accuracy tests [BZ #28765]Sunil K Pandey2022-01-141-4/+4
| | | | | | | | | | | | | | | This patch adds following inputs: 0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54 0x1.a11891ec004d4p-348 0x1.814830510be26p-348 0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588 0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633 to atan2 tests and updates x86_64 double atan2 ulps. This fixes BZ #28765. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* x86-64: Add vector tan/tanf implementation to libmvecSunil K Pandey2021-12-301-0/+20
| | | | | | | | Implement vectorized tan/tanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tan/tanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector erfc/erfcf implementation to libmvecSunil K Pandey2021-12-301-0/+20
| | | | | | | | Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erfc/erfcf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector asinh/asinhf implementation to libmvecSunil K Pandey2021-12-291-0/+17
| | | | | | | | Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asinh/asinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector tanh/tanhf implementation to libmvecSunil K Pandey2021-12-291-0/+15
| | | | | | | | Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tanh/tanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector erf/erff implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized erf/erff containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erf/erff with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector acosh/acoshf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acosh/acoshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector atanh/atanhf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atanh/atanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector log1p/log1pf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log1p/log1pf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector log2/log2f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized log2/log2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log2/log2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector log10/log10f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized log10/log10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log10/log10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector atan2/atan2f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan2/atan2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector cbrt/cbrtf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector sinh/sinhf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector sinh/sinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector expm1/expm1f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector expm1/expm1f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector cosh/coshf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cosh/coshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector exp10/exp10f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector exp2/exp2f implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector hypot/hypotf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector asin/asinf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector atan/atanf implementation to libmvecSunil K Pandey2021-12-291-0/+20
| | | | | | | | Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add vector acos/acosf implementation to libmvecSunil K Pandey2021-12-221-0/+20
| | | | | | | | Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* Regenerate ulps on x86_64 with GCC 12H.J. Lu2021-12-201-1/+1
| | | | | | | | | Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12.
* regenerate ulps on x86_64 with -march=nativePaul Zimmermann2021-04-281-3/+3
| | | | | | | On x86_64, when configuring glibc with CFLAGS="-O2 -g -march=native", some tests fail. After this patch, "make check" succeeds. Tested on Intel Core i5-4590 with gcc 10.2.1.
* Improve the accuracy of tgamma (BZ #26983)Paul Zimmermann2021-04-071-3/+3
| | | | | | | | | | | | With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]Paul Zimmermann2021-04-021-38/+38
| | | | | | | | | | | | | | | | | | | | | | | For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* math: Remove slow paths from asin and acos [BZ #15267]Wilco Dijkstra2021-03-111-1/+3
| | | | | | | | | | | This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* Add inputs that generate larger error boundsPaul Zimmermann2021-02-271-24/+26
| | | | (Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
* add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64)Paul Zimmermann2020-12-211-11/+13
|
* math: Update x86_64 ulpsAdhemerval Zanella2020-08-081-3/+3
| | | | From new j0 test.
* Update x86-64 libm-test-ulpsAndreas K. Hüttel2020-07-251-2/+2
| | | | | | x86_64 Intel(R) Core(TM) i5-8265U gcc (Gentoo 10.1.0-r2 p3) 10.1.0 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* math: Add inputs that yield larger errors for float type (x86_64)Paul Zimmermann2020-03-311-42/+46
| | | | | | | | | | | The corner cases included were generated using exhaustive search for all float/binary32 values on x86_64 (comparing to MPFR for correct rounding to nearest). For the j0/j1/y0 functions, only cases with ulp error <= 9 were included. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* math: Remove inline math testsAdhemerval Zanella2020-03-191-1116/+0
| | | | | | | | | | | | | With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.
* Regenerate sysdeps/x86_64/fpu/libm-test-ulpsH.J. Lu2018-12-261-0/+6
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* Add new exp and exp2 implementationsSzabolcs Nagy2018-09-051-44/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Fix spaces in x86_64 ULP fileWilco Dijkstra2018-08-151-3/+3
| | | | | | | | Fix a few missing spaces, it's now identical to the regenerated version. Passes GLIBC tests on x64. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
* Improve performance of sinf and cosfWilco Dijkstra2018-08-141-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * |x| < 0x1p-12 : 1.2x * |x| < M_PI_4 : 1.8x * |x| < 2 * M_PI: 1.7x * |x| < 120.0 : 2.3x * |x| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
* Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X.Paul Pluzhnikov2018-05-301-1/+1
| | | | | | | 2018-05-30 Paul Pluzhnikov <ppluzhnikov@google.com> * sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for AMD Ryzen 7 1800X.
* [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputsWilco Dijkstra2018-04-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | This series of patches removes the slow patchs from sin, cos and sincos. Besides greatly simplifying the implementation, the new version is also much faster for inputs up to PI (41% faster) and for large inputs needing range reduction (27% faster). ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most of the range with mpsin and mpcos. The number of incorrectly rounded results (ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5, the average is ~850 per million between 0 and PI. Tested on AArch64 and x86_64 with no regressions. The first patch removes the slow paths for the cases where the input is small and doesn't require range reduction. Update ULP tables for sin, cos and sincos on AArch64 and x86_64. * sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos. * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small inputs. (__cos): Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
* Remove slow paths from powWilco Dijkstra2018-02-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
* Revert exp reimplementation (causes test failures).Joseph Myers2017-12-191-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise.
* Update x86_64 libm-test-ulps.Joseph Myers2017-12-191-8/+10
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* x86-64: Replace assembly versions of e_expf with generic e_expf.cH.J. Lu2017-10-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces x86-64 assembly versions of e_expf with generic e_expf.c. For workload-spec2017.wrf, on Nehalem, it improves performance by: Before After Improvement reciprocal-throughput 36.039 20.7749 73% latency 58.8096 40.8715 43% On Skylake, it improves Before After Improvement reciprocal-throughput 18.4436 11.1693 65% latency 47.5162 37.5411 26% * sysdeps/x86_64/fpu/e_expf.S: Removed. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise. * sysdeps/x86_64/fpu/w_expf.c: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic e_expf.c. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c): New. * sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file. * sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf): Renamed to ... (__redirect_expf): This. (SYMBOL_NAME): Changed to expf. (__ieee754_expf): Renamed to ... (__expf): This. (__GI___expf): This. (__ieee754_expf): Add strong_alias. (__expf_finite): Likewise. (__expf): New. Include <sysdeps/ieee754/flt-32/e_expf.c>.
* Update x86_64 libm-test-ulps.Joseph Myers2017-09-291-2/+2
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Update.
* Update x86_64 ulps for AMD Ryzen.Markus Trippelsdorf2017-09-081-2/+2
| | | | * sysdeps/x86_64/fpu/libm-test-ulps: Update for AMD Ryzen.
* Obsolete pow10 functions.Joseph Myers2017-09-011-30/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch obsoletes the pow10, pow10f and pow10l functions (makes them into compat symbols, not available for new ports or static linking). The exp10 names for these functions are standardized (in TS 18661-4) and were added in the same glibc version (2.1) as pow10 so source code can change to use them without any loss of portability. Since pow10 is deliberately not provided for _Float128, only exp10, this slightly simplifies moving to the new wrapper templates in the !LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10, pow10f and pow10l to be defined by those templates. Tested for x86_64, and with build-many-glibcs.py. * manual/math.texi (pow10): Do not document. (pow10f): Likewise. (pow10l): Likewise. * math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare. * math/bits/math-finite.h [__USE_GNU] (pow10): Likewise. * math/libm-test-exp10.inc (pow10_test): Remove. (do_test): Do not call pow10. * math/w_exp10_compat.c (pow10): Make into compat symbol. [NO_LONG_DOUBLE] (pow10l): Likewise. * math/w_exp10f_compat.c (pow10f): Likewise. * math/w_exp10l_compat.c (pow10l): Likewise. * sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>. (pow10): Make into compat symbol. * sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>. (pow10f): Make into compat symbol. * sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>. (pow10l): Make into compat symbol. * sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove pow10. (CFLAGS-nldbl-pow10.c): Remove variable.. * sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file. * sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on [SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)]. * sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol): Undefine and redefine. (pow10l): Make into compat symbol. * sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps. * sysdeps/alpha/fpu/libm-test-ulps: Likewise. * sysdeps/arm/libm-test-ulps: Likewise. * sysdeps/hppa/fpu/libm-test-ulps: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/microblaze/libm-test-ulps: Likewise. * sysdeps/mips/mips32/libm-test-ulps: Likewise. * sysdeps/mips/mips64/libm-test-ulps: Likewise. * sysdeps/nios2/libm-test-ulps: Likewise. * sysdeps/powerpc/fpu/libm-test-ulps: Likewise. * sysdeps/powerpc/nofpu/libm-test-ulps: Likewise. * sysdeps/s390/fpu/libm-test-ulps: Likewise. * sysdeps/sh/libm-test-ulps: Likewise. * sysdeps/sparc/fpu/libm-test-ulps: Likewise. * sysdeps/tile/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* x86-64: Regenerate libm-test-ulps for AVX512 mathvec testsH.J. Lu2017-08-231-2/+2
| | | | | | | Update libm-test-ulps for AVX512 mathvec tests by running “make regen-ulps” on Intel Xeon processor with AVX512. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* Add float128 support for x86_64, x86.Joseph Myers2017-06-261-0/+564
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables float128 support for x86_64 and x86. All GCC versions that can build glibc provide the required support, but since GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN tests and some tests of NaN payloads need to be disabled with such compilers (this does not affect the generated glibc binaries at all, just the tests). bits/floatn.h declares float128 support to be available for GCC versions that provide the required libgcc support (4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd); compilation-only support was present some time before then, but not really useful without the libgcc functions. fenv_private.h needed updating to avoid trying to put _Float128 values in registers. I make no assertion of optimality of the math_opt_barrier / math_force_eval definitions for this case; they are simply intended to be sufficient to work correctly. Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was compilation tests only with build-many-glibcs.py to verify the ABI baseline updates. I have not done any testing for Hurd, although the float128 support is enabled there as for GNU/Linux.) * sysdeps/i386/Implies: Add ieee754/float128. * sysdeps/x86_64/Implies: Likewise. * sysdeps/x86/bits/floatn.h: New file. * sysdeps/x86/float128-abi.h: Likewise. * manual/math.texi (Mathematics): Document support for _Float128 on x86_64 and x86. * sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>. (math_opt_barrier): Do not put _Float128 values in floating-point registers. (math_force_eval): Likewise. [__x86_64__] (SET_RESTORE_ROUNDF128): New macro. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append to Makefile variable. * sysdeps/x86/fpu/e_sqrtf128.c: New file. * sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc. * sysdeps/x86/math-tests.h: New file. * math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro. * math/libm-test-getpayload.inc (getpayload_test_data): Use XFAIL_FLOAT128_PAYLOAD. * math/libm-test-setpayload.inc (setpayload_test_data): Likewise. * math/libm-test-totalorder.inc (totalorder_test_data): Likewise. * math/libm-test-totalordermag.inc (totalordermag_test_data): Likewise. * sysdeps/unix/sysv/linux/i386/libc.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Move tests of catan, catanh to auto-libm-test-*.Joseph Myers2017-02-171-10/+22
| | | | | | | | | | | | | | | | | | | | | | | | This patch moves tests of catan and catanh with finite inputs (other than the divide-by-zero cases producing an exact infinity) to using the auto-libm-test machinery. Each of auto-libm-test-out-catan and auto-libm-test-out-catanh takes about three seconds to generate on my system (so in fact it wasn't necessary after all to defer the move to auto-libm-test-* until the output files were split up by function). Tested for x86_64 and x86 and ulps updated accordingly. * math/auto-libm-test-in: Add tests of catan and catanh. * math/auto-libm-test-out-catan: New generated file. * math/auto-libm-test-out-catanh: Likewise. * math/libm-test-catan.inc (catan_test_data): Use AUTO_TESTS_c_c. Move tests with finite inputs, except divide-by-zero cases, to auto-libm-test-in. * math/libm-test-catanh.inc (catanh_test_data): Likewise. * math/Makefile (libm-test-funcs-auto): Add catan and catanh. (libm-test-funcs-noauto): Remove catan and catanh. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* Move tests of casin, casinh to auto-libm-test-*.Joseph Myers2017-02-171-38/+38
| | | | | | | | | | | | | | | | | | | | | | | This patch moves tests of casin and casinh with finite inputs to using the auto-libm-test machinery. Each of auto-libm-test-out-casin and auto-libm-test-out-casinh takes about 38 minutes to generate on my system because of MPC slowness on special cases that appear in the tests (with MPC 1.0.3; I don't know to what extent current MPC master might speed it up). Tested for x86_64 and x86 and ulps updated accordingly. * math/auto-libm-test-in: Add tests of casin and casinh. * math/auto-libm-test-out-casin: New generated file. * math/auto-libm-test-out-casinh: Likewise. * math/libm-test-casin.inc (casin_test_data): Use AUTO_TESTS_c_c. Move tests with finite inputs to auto-libm-test-in. * math/libm-test-casinh.inc (casinh_test_data): Likewise. * math/Makefile (libm-test-funcs-auto): Add casin and casinh. (libm-test-funcs-noauto): Remove casin and casinh. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.