| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).
As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future. Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.
exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode. Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.
The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.
The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).
This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range). The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p). As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.
Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.
The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either). It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately. For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).
This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation. If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).
The test inputs are copied from those for log1p. Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).
The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)
Tested for x86_64 and x86, and with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
| |
Based on feedback by Mike Gilbert <floppym@gentoo.org>
Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor
-march=amdfam10
failures occur for x32 ABI
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update libm test ulps for
commit 3efbf11fdf15ed991d2c41743921c524a867e145
Author: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Date: Tue Feb 14 11:24:59 2023 +0100
update auto-libm-test-out-hypot
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds following inputs:
0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54
0x1.a11891ec004d4p-348 0x1.814830510be26p-348
0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588
0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633
to atan2 tests and updates x86_64 double atan2 ulps.
This fixes BZ #28765.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
|
|
|
|
|
|
|
|
| |
Implement vectorized tan/tanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tan/tanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erfc/erfcf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized erf/erff containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erf/erff with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log2/log2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log10/log10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Fix
FAIL: math/test-float-clog10
FAIL: math/test-float32-clog10
on Intel Core i7-1165G7 with GCC 12.
|
|
|
|
|
|
|
| |
On x86_64, when configuring glibc with CFLAGS="-O2 -g -march=native",
some tests fail. After this patch, "make check" succeeds.
Tested on Intel Core i5-4590 with gcc 10.2.1.
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).
Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.
The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average. Two different algorithms are used:
* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)
* for large inputs, an asymptotic formula from [1] is used
[1] Fast and Accurate Bessel Function Computation,
John Harrison, Proceedings of Arith 19, 2009.
Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).
Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.
Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
|
|
|
|
| |
(Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
|
| |
|
|
|
|
| |
From new j0 test.
|
|
|
|
|
|
| |
x86_64 Intel(R) Core(TM) i5-8265U
gcc (Gentoo 10.1.0-r2 p3) 10.1.0
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).
For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
| |
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2. There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.
The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.
The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case). Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.
The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.
New double precision error handling code was added following the
style of the single precision error handling code.
Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]
For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.
Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).
* NEWS: Mention exp and exp2 improvements.
* math/Makefile (libm-support): Remove t_exp.
(type-double-routines): Add math_err and e_exp_data.
* sysdeps/aarch64/libm-test-ulps: Update.
* sysdeps/arm/libm-test-ulps: Update.
* sysdeps/i386/fpu/e_exp_data.c: New file.
* sysdeps/i386/fpu/math_err.c: New file.
* sysdeps/i386/fpu/t_exp.c: Remove.
* sysdeps/ia64/fpu/e_exp_data.c: New file.
* sysdeps/ia64/fpu/math_err.c: New file.
* sysdeps/ia64/fpu/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
* sysdeps/ieee754/dbl-64/math_config.h: New file.
* sysdeps/ieee754/dbl-64/math_err.c: New file.
* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
|
|
|
|
|
|
|
|
| |
Fix a few missing spaces, it's now identical to the regenerated version.
Passes GLIBC tests on x64.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials. The returned values are identical to sincosf
for the same input. ULP definitions for AArch64 and x64 are updated.
sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4 : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0 : 2.3x
* |x| < Inf : 3.0x
* NEWS: Mention sinf, cosf, sincosf.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
constants rather than including generic sincosf.h.
* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
(reduced_cos): Remove.
(sinf_poly): New function.
* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
|
|
|
|
|
|
|
| |
2018-05-30 Paul Pluzhnikov <ppluzhnikov@google.com>
* sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for
AMD Ryzen 7 1800X.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).
ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos. The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.
Tested on AArch64 and x86_64 with no regressions.
The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction. Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
inputs.
(__cos): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert:
2017-12-19 Joseph Myers <joseph@codesourcery.com>
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com>
* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
<errno.h>. Include "eexp.tbl".
(half): New constant.
(one): Likewise.
(__ieee754_exp): Rewrite.
(__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
* sysdeps/i386/fpu/slowexp.c: Likewise.
* sysdeps/ia64/fpu/slowexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
comment.
* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
(CPPFLAGS-slowexp.c): Remove variable.
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
(CFLAGS-slowexp-fma.c): Remove variable.
(CFLAGS-slowexp-fma4.c): Likewise.
(CFLAGS-slowexp-avx.c): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
define as macro.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
* math/Makefile (type-double-routines): Remove slowexp.
* manual/probes.texi (slowexp_p6): Remove.
(slowexp_p32): Likewise.
|
|
|
|
| |
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces x86-64 assembly versions of e_expf with generic
e_expf.c. For workload-spec2017.wrf, on Nehalem, it improves
performance by:
Before After Improvement
reciprocal-throughput 36.039 20.7749 73%
latency 58.8096 40.8715 43%
On Skylake, it improves
Before After Improvement
reciprocal-throughput 18.4436 11.1693 65%
latency 47.5162 37.5411 26%
* sysdeps/x86_64/fpu/e_expf.S: Removed.
* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise.
* sysdeps/x86_64/fpu/w_expf.c: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic
e_expf.c.
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c):
New.
* sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file.
* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
Renamed to ...
(__redirect_expf): This.
(SYMBOL_NAME): Changed to expf.
(__ieee754_expf): Renamed to ...
(__expf): This.
(__GI___expf): This.
(__ieee754_expf): Add strong_alias.
(__expf_finite): Likewise.
(__expf): New.
Include <sysdeps/ieee754/flt-32/e_expf.c>.
|
|
|
|
| |
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
|