math: Use exp10m1f from CORE-MATH - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-10-25 15:21:47 -0300
committer	Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-11-01 11:27:26 -0300
commit	5fa89852fa12fe56c315a119998affa267200f1b (patch)
tree	78405c4b6dc8631840a34021e4663fa2fba6021f /sysdeps/x86_64/fpu/libm-test-ulps
parent	48767cbb76e17d0ee03b2cf0a43bcf01e7295b8b (diff)
download	glibc-5fa89852fa12fe56c315a119998affa267200f1b.tar.gz glibc-5fa89852fa12fe56c315a119998affa267200f1b.tar.xz glibc-5fa89852fa12fe56c315a119998affa267200f1b.zip

math: Use exp10m1f from CORE-MATH

The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp10m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  I mostly
fixed some small issues in corner cases (sNaN handling, -INFINITY,
a specific overflow check).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      45.4690        49.5845        -9.05%
x86_64v2                    46.1604        36.2665        21.43%
x86_64v3                    37.8442        31.0359        17.99%
i686                        121.367        93.0079        23.37%
aarch64                     21.1126        15.0165        28.87%
power10                     12.7426        8.4929         33.35%

reciprocal-throughput        master        patched   improvement
x86_64                      19.6005        17.4005        11.22%
x86_64v2                    19.6008        11.1977        42.87%
x86_64v3                    17.5427        10.2898        41.34%
i686                        59.4215        60.9675        -2.60%
aarch64                     13.9814        7.9173         43.37%
power10                      6.7814        6.4258          5.24%

The generic implementation calls __ieee754_exp10f which has an
optimized version, although it is not correctly rounded, which is
the main culprit of the the latency difference for x86_64 and
throughp for i686.

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>

Diffstat (limited to 'sysdeps/x86_64/fpu/libm-test-ulps')

-rw-r--r--

sysdeps/x86_64/fpu/libm-test-ulps

1 files changed, 0 insertions, 4 deletions


context:
space:
mode: