| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- math.h will use compiler builtin for gcc 4.4 when built without
-fsignaling-nans and the builtin is expanded inline for all
support architectures. As an example, there is no intra isnan
call on libm for the architecture I checked, x86, arm, aarch64,
and powerpc.
- The resulting binary difference on 32 bits architecture is minimum
for the non hotspot symbol.
- It helps wordsize-64 architectures that use ldbl-opt.
- It add some code simplification with reduction of duplicated
implementations.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/ieee754/dbl-64/wordsize-64/s_isnan.c: Move to ...
* sysdeps/ieee754/dbl-64/s_isnan.c: ... here and format code.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One group of warnings seen with -Wextra is warnings for static or
inline not at the start of a declaration (-Wold-style-declaration).
This patch fixes various such cases for inline, ensuring it comes at
the start of the declaration (after any static). A common case of the
fix is "static inline <type> __always_inline"; the definition of
__always_inline starts with __inline, so the natural change is to
"static __always_inline <type>". Other cases of the warning may be
harder to fix (one pattern is a function definition that gets
rewritten to be static by an including file, "#define funcname static
wrapped_funcname" or similar), but it seems worth fixing these cases
with inline anyway.
Tested for x86_64.
* elf/dl-load.h (_dl_postprocess_loadcmd): Use __always_inline
before return type, without separate inline.
* elf/dl-tunables.c (maybe_enable_malloc_check): Likewise.
* elf/dl-tunables.h (tunable_is_name): Likewise.
* malloc/malloc.c (do_set_trim_threshold): Likewise.
(do_set_top_pad): Likewise.
(do_set_mmap_threshold): Likewise.
(do_set_mmaps_max): Likewise.
(do_set_mallopt_check): Likewise.
(do_set_perturb_byte): Likewise.
(do_set_arena_test): Likewise.
(do_set_arena_max): Likewise.
(do_set_tcache_max): Likewise.
(do_set_tcache_count): Likewise.
(do_set_tcache_unsorted_limit): Likewise.
* nis/nis_subr.c (count_dots): Likewise.
* nptl/allocatestack.c (advise_stack_range): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Likewise.
(do_sin): Likewise.
(reduce_sincos): Likewise.
(do_sincos): Likewise.
* sysdeps/unix/sysv/linux/x86/elision-conf.c
(do_set_elision_enable): Likewise.
(TUNABLE_CALLBACK_FNDECL): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apply the following spelling fixes:
$ git grep -F -l 'relevent' |
xargs sed -i 's/relevent/relevant/g'
$ git grep -F -l 'checked fot' |
xargs sed -i 's/checked fot/checked for/g'
$ git grep -F -l "could't" |
xargs sed -i "s/could't/couldn't/g"
$ git grep -F -l 'wheter' | grep -Fv ChangeLog.old |
xargs sed -i 's/wheter/whether/g'
$ git grep -F -l 'neccessary' | grep -Fv ChangeLog.old |
xargs sed -i 's/neccessary/necessary/g'
$ git grep -F -l 'ouput' |
xargs sed -i 's/ouput/output/g'
$ git grep -F -w -l 'iput' |
xargs sed -i 's/iput/input/g'
This is inspired by a gnulib bug report at
https://lists.gnu.org/archive/html/bug-gnulib/2019-01/msg00081.html
* argp/argp-help.c: Fix typo in comment.
* misc/sys/cdefs.h: Likewise.
* posix/regexec.c (sift_states_iter_mb): Likewise.
* socket/sockatmark.c: Likewise.
* socket/sys/socket.h: Likewise.
* sysdeps/ia64/fpu/libm_sincos_large.S: Likewise.
* sysdeps/ia64/fpu/libm_sincosl.S: Likewise.
* sysdeps/ia64/fpu/s_cosl.S: Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c: Likewise.
* sysdeps/unix/sockatmark.c: Likewise.
* time/strptime_l.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With -O included in CFLAGS it fails to build with:
../sysdeps/ieee754/ldbl-96/e_jnl.c: In function '__ieee754_jnl':
../sysdeps/ieee754/ldbl-96/e_jnl.c:146:20: error: 'temp' may be used uninitialized in this function [-Werror=maybe-uninitialized]
b = invsqrtpi * temp / sqrtl (x);
~~~~~~~~~~^~~~~~
../sysdeps/ieee754/ldbl-96/e_jnl.c: In function '__ieee754_ynl':
../sysdeps/ieee754/ldbl-96/e_jnl.c:375:16: error: 'temp' may be used uninitialized in this function [-Werror=maybe-uninitialized]
b = invsqrtpi * temp / sqrtl (x);
~~~~~~~~~~^~~~~~
../sysdeps/ieee754/dbl-64/e_jn.c: In function '__ieee754_jn':
../sysdeps/ieee754/dbl-64/e_jn.c:113:20: error: 'temp' may be used uninitialized in this function [-Werror=maybe-uninitialized]
b = invsqrtpi * temp / sqrt (x);
~~~~~~~~~~^~~~~~
../sysdeps/ieee754/dbl-64/e_jn.c: In function '__ieee754_yn':
../sysdeps/ieee754/dbl-64/e_jn.c:320:16: error: 'temp' may be used uninitialized in this function [-Werror=maybe-uninitialized]
b = invsqrtpi * temp / sqrt (x);
~~~~~~~~~~^~~~~~
Build tested with Yocto for ARM, AARCH64, X86, X86_64, PPC, MIPS, MIPS64
with -O, -O1, -Os.
For AARCH64 it needs one more fix in locale for -Os:
https://sourceware.org/ml/libc-alpha/2018-09/msg00539.html
[BZ #19444]
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Use
__builtin_unreachable for default case in switch.
(__ieee754_yn): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
|
|
|
|
|
|
|
| |
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new pow symbol version that doesn't do SVID compatible error
handling. The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.
The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_pow.c and enabled for targets with their own pow implementation or
ifunc dispatch on __ieee754_pow by including math/w_pow.c.
The compatibility symbol version still uses the wrapper with SVID error
handling around the new code. There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
On targets where previously powl was an alias of pow, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.
The __pow_finite symbol is now an alias of pow. Both __pow_finite and
pow set errno and thus not const functions.
The ia64 asm is changed so the compat and new symbol versions map to the
same address.
On x86_64 #include <math.h> was added before macro definitions that
may affect that header.
Tested with build-many-glibcs.py.
* math/Versions (GLIBC_2.29): Add pow.
* math/w_pow_compat.c (__pow_compat): Change to versioned compat
symbol.
* math/w_pow.c: New file.
* sysdeps/i386/fpu/w_pow.c: New file.
* sysdeps/ia64/fpu/e_pow.S: Add versioned symbols.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow
and add necessary aliases.
* sysdeps/ieee754/dbl-64/w_pow.c: New file.
* sysdeps/m68k/m680x0/fpu/w_pow.c: New file.
* sysdeps/mach/hurd/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to
__pow.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise.
* sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new log2 symbol version that doesn't do SVID compatible error
handling. The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.
The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_log2.c and enabled for targets with their own log2 implementation by
including math/w_log2.c.
The compatibility symbol version still uses the wrapper with SVID error
handling around the new code. There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
On targets where previously log2l was an alias of log2, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.
The __log2_finite symbol is now an alias of log2. Both __log2_finite
and log2 set errno and thus not const functions.
The ia64 asm is changed so the compat and new symbol versions map to the
same address.
Tested with build-many-glibcs.py.
* math/Versions (GLIBC_2.29): Add log2.
* math/w_log2_compat.c (__log2_compat): Change to versioned compat
symbol.
* math/w_log2.c: New file.
* sysdeps/i386/fpu/w_log2.c: New file.
* sysdeps/ia64/fpu/e_log2.S: Add versioned symbols.
* sysdeps/ieee754/dbl-64/e_log2.c (__ieee754_log2): Rename to __log2
and add necessary aliases.
* sysdeps/ieee754/dbl-64/w_log2.c: New file.
* sysdeps/m68k/m680x0/fpu/w_log2.c: New file.
* sysdeps/mach/hurd/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new log symbol version that doesn't do SVID compatible error
handling. The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.
The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_log.c and enabled for targets with their own log implementation by
including math/w_log.c.
The compatibility symbol version still uses the wrapper with SVID error
handling around the new code. There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
On targets where previously logl was an alias of log, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.
The __log_finite symbol is now an alias of log. Both __log_finite and
log set errno and thus not const functions.
The ia64 asm is changed so the compat and new symbol versions map to the
same address.
On x86_64 #include <math.h> was added before macro definitions that may
affect that header.
Tested with build-many-glibcs.py.
* math/Versions (GLIBC_2.29): Add log.
* math/w_log_compat.c (__log_compat): Change to versioned compat
symbol.
* math/w_log.c: New file.
* sysdeps/i386/fpu/w_log.c: New file.
* sysdeps/ia64/fpu/e_log.S: Update.
* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log
and add necessary aliases.
* sysdeps/ieee754/dbl-64/w_log.c: New file.
* sysdeps/m68k/m680x0/fpu/w_log.c: New file.
* sysdeps/mach/hurd/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
* sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to
__log.
* sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise.
* sysdeps/x86_64/fpu/multiarch/w_log.c: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new exp and exp2 symbol version that don't do SVID compatible
error handling. The standard errno and fp exception based error handling
is inline in the new code and does not have significant overhead.
The double precision wrappers are disabled for sysdeps/ieee754/dbl-64
by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and
math/w_exp2.c files use the wrapper template and can be included by
targets that have their own exp and exp2 implementations or use ifunc
on the glibc internal __ieee754_exp symbol.
The compatibility symbol versions still use the wrapper with SVID error
handling around the new code. There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
On targets where previously expl and exp2l were aliases of exp and exp2,
now they point to the compatibility symbols with the wrapper, because
they still need the SVID compatible error handling. This affects
NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets
as well.
The _finite symbols are now aliases of the standard symbols (they have
no performance advantage anymore). Both the standard symbols and
_finite symbols set errno and thus not const functions.
The ia64 asm is changed so the compat and new symbol versions map to the
same address.
On x86_64 #include <math.h> was added before macro definitions that may
affect that header (the new macro name is __exp instead of __ieee754_exp
which breaks some math.h macros).
Tested with build-many-glibcs.py.
* math/Versions (GLIBC_2.29): Add exp and exp2.
* math/w_exp2_compat.c (__exp2_compat): Change to versioned compat
symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly.
* math/w_exp_compat.c (__exp_compat): Likewise.
* math/w_exp.c: New file.
* math/w_exp2.c: New file.
* sysdeps/i386/fpu/w_exp.c: New file.
* sysdeps/i386/fpu/w_exp2.c: New file.
* sysdeps/ia64/fpu/e_exp.S: Add versioned symbols.
* sysdeps/ia64/fpu/e_exp2.S: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp
and add necessary aliases.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2
and add necessary aliases.
* sysdeps/ieee754/dbl-64/w_exp.c: New file.
* sysdeps/ieee754/dbl-64/w_exp2.c: New file.
* sysdeps/m68k/m680x0/fpu/w_exp.c: New file.
* sysdeps/m68k/m680x0/fpu/w_exp2.c: New file.
* sysdeps/mach/hurd/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove.
(__ieee754_exp): Rename to __exp.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove.
(__ieee754_exp): Rename to __exp.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove.
(__ieee754_exp): Rename to __exp.
* sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to
__exp.
* sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After my changes to move various macros, inlines and other content
from math_private.h to more specific headers, many files including
math_private.h no longer need to do so. Furthermore, since the
optimized inlines of various functions have been moved to
include/fenv.h or replaced by use of function names GCC inlines
automatically, a missing math_private.h include where one is
appropriate will reliably cause a build failure rather than possibly
causing code to be less well optimized while still building
successfully. Thus, this patch removes includes of math_private.h
that are now unnecessary. In the case of two RISC-V files, the
include is replaced by one of stdbool.h because the files in question
were relying on math_private.h to get a definition of bool.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* math/fromfp.h: Do not include <math_private.h>.
* math/s_cacosh_template.c: Likewise.
* math/s_casin_template.c: Likewise.
* math/s_casinh_template.c: Likewise.
* math/s_ccos_template.c: Likewise.
* math/s_cproj_template.c: Likewise.
* math/s_fdim_template.c: Likewise.
* math/s_fmaxmag_template.c: Likewise.
* math/s_fminmag_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/s_ldexp_template.c: Likewise.
* math/s_nextdown_template.c: Likewise.
* math/w_log1p_template.c: Likewise.
* math/w_scalbln_template.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/aarch64/fpu/s_llrint.c: Likewise.
* sysdeps/aarch64/fpu/s_llrintf.c: Likewise.
* sysdeps/aarch64/fpu/s_lrint.c: Likewise.
* sysdeps/aarch64/fpu/s_lrintf.c: Likewise.
* sysdeps/i386/fpu/s_atanl.c: Likewise.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/i386/fpu/s_fdim.c: Likewise.
* sysdeps/i386/fpu/s_logbl.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/i386/fpu/s_significandl.c: Likewise.
* sysdeps/ia64/fpu/s_matherrf.c: Likewise.
* sysdeps/ia64/fpu/s_matherrl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise.
* sysdeps/ieee754/k_standardf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/s_signgam.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of
<math_private.h>.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __copysign functions to call
the corresponding copysign names instead, with asm redirection to
__copysign when the calls are not inlined (all cases are inlined
except for IBM long double for powerpc soft-float / e500v1). This
eliminates the need for an inline function defining __copysign in
terms of __builtin_copysign.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT]
(MATH_REDIRECT_BINARY_ARGS): New macro.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (copysign): Redirect using MATH_REDIRECT.
* sysdeps/alpha/fpu/s_copysign.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/alpha/fpu/s_copysignf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_copysign.c: Likewise.
* sysdeps/ieee754/float128/s_copysignf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_copysignf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_copysignl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_copysignl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
* sysdeps/riscv/rvd/s_copysign.c: Likewise.
* sysdeps/riscv/rvf/s_copysignf.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/generic/math_private_calls.h
[!__MATH_DECLARING_LONG_DOUBLE || !NO_LONG_DOUBLE] (__copysign):
Do not declare and define as an inline function.
* math/divtc3.c (__divtc3): Use copysign functions instead of
__copysign variants.
* math/multc3.c (__multc3): Likewise.
* sysdeps/generic/math-type-macros.h (M_COPYSIGN): Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c (signArctan2): Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c (__ieee754_atanh): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise.
(__ieee754_yn): Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c (__signArctan): Likewise.
* sysdeps/ieee754/dbl-64/s_scalbln.c (__scalbln): Likewise.
* sysdeps/ieee754/dbl-64/s_scalbn.c (__scalbn): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Likewise.
(__sin): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c (__nearbyint):
Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_scalbln.c (__scalbln):
Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (__scalbn):
Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c (__ieee754_atanhf): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
(__ieee754_ynf): Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise.
* sysdeps/ieee754/flt-32/s_scalbnf.c (__scalbnf): Likewise.
* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c (__fmal): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl)
* sysdeps/ieee754/ldbl-96/s_asinhl.c (__asinhl): Likewise.
* sysdeps/ieee754/ldbl-96/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-copysign.c (copysignl): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __round functions to call the
corresponding round names instead, with asm redirection to __round
when the calls are not inlined.
An additional complication arises in
sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the
result converted to int, gets converted by the compiler to call
lroundl in the case of 32-bit long, so resulting in localplt test
failures. It's logically correct to let the compiler make such an
optimization; an appropriate asm redirection of lroundl to __lroundl
is thus added to that file (it's not needed anywhere else).
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_roundf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_round.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise.
* sysdeps/ieee754/float128/s_roundf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise.
(round): Redirect to __round.
(__roundl): Call round instead of __round.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round):
Remove macro.
[_ARCH_PWR5X] (__roundf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round
functions instead of __round variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to
__lroundl.
(__ieee754_expl): Call roundl instead of __roundl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __trunc functions to call the
corresponding trunc names instead, with asm redirection to __trunc
when the calls are not inlined.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_truncf.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise.
* sysdeps/ieee754/float128/s_truncf128.c: Likewise.
* sysdeps/ieee754/dbl-64/s_trunc.c: Likewise.
* sysdeps/ieee754/flt-32/s_truncf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise.
(ceil): Redirect to __ceil.
(floor): Redirect to __floor.
(trunc): Redirect to __trunc.
(__truncl): Call trunc instead of __trunc.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc):
Remove macro.
[_ARCH_PWR5X] (__truncf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use
trunc functions instead of __trunc variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent). The __exp1 internal symbol
is no longer necessary.
There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases. The lookup table and consts for
log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on
aarch64. The non-nearest rounding error is less than 1 ULP.
Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]
Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.
* NEWS: Mention pow improvements.
* math/Makefile (type-double-routines): Add e_pow_log_data.
* sysdeps/generic/math_private.h (__exp1): Remove.
* sysdeps/i386/fpu/e_pow_log_data.c: New file.
* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
contraction.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
(exp_inline): Remove.
(__ieee754_exp): Only single double input is handled.
* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
(__pow_log_data): Define.
* sysdeps/ieee754/dbl-64/upow.h: Remove.
* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
contraction.
(CFLAGS-e_pow-fma4.c): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __ceil functions to call the
corresponding ceil names instead, with asm redirection to __ceil when
the calls are not inlined.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_ceilf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_ceil.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise.
* sysdeps/ieee754/float128/s_ceilf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_ceilf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil):
Remove macro.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil
functions instead of __ceil variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined. The x86_64 math_private.h is removed as no
longer useful after this patch.
This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
* sysdeps/alpha/fpu/s_rint.c: Likewise.
* sysdeps/alpha/fpu/s_rintf.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
* sysdeps/powerpc/fpu/s_rint.c: Likewise.
* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/math_private.h: Remove file.
* math/e_scalb.c (invalid_fn): Use rint functions instead of
__rint variants.
* math/e_scalbf.c (invalid_fn): Likewise.
* math/e_scalbl.c (invalid_fn): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).
The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).
The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.
Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example. I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally. (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
New macro.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (floor): Likewise.
* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
Remove macro.
[_ARCH_PWR5X] (__floorf): Likewise.
* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
inline function.
[__SSE4_1__] (__floorf): Likewise.
* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
instead of __floor variants.
* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
* math/w_tgamma_compat.c (__tgamma): Likewise.
* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
* math/w_tgammaf_compat.c (__tgammaf): Likewise.
* math/w_tgammal_compat.c (__tgammal): Likewise.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar algorithm is used as in log: log2(2^k x) = k + log2(c) + log2(x/c)
where the last term is approximated by a polynomial of x/c - 1, the first
order coefficient is about 1/ln2 in this case.
There is separate code path when fma instruction is not available for
computing x/c - 1 precisely, for which the table size is doubled.
The worst case error is 0.547 ULP (0.55 without fma), the read only
global data size is 1168 bytes (2192 without fma) on aarch64. The
non-nearest rounding error is less than 1 ULP.
Improvements on Cortex-A72 compared to current glibc master:
log2 thruput: 2.00x in [0.01 11.1]
log2 latency: 2.04x in [0.01 11.1]
log2 thruput: 2.17x in [0.999 1.001]
log2 latency: 2.88x in [0.999 1.001]
Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA)
arm-linux-gnueabihf (!defined __FP_FAST_FMA)
x86_64-linux-gnu (!defined __FP_FAST_FMA)
powerpc64le-linxu-gnu (defined __FP_FAST_FMA)
targets.
* NEWS: Mention log2 improvements.
* math/Makefile (type-double-routines): Add e_log2_data.
* sysdeps/i386/fpu/e_log2_data.c: New file.
* sysdeps/ia64/fpu/e_log2_data.c: New file.
* sysdeps/ieee754/dbl-64/e_log2.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_log2_data.c: New file.
* sysdeps/ieee754/dbl-64/math_config.h (__log2_data): Add.
* sysdeps/ieee754/dbl-64/wordsize-64/e_log2.c: Remove.
* sysdeps/m68k/m680x0/fpu/e_log2_data.c: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimized log using carefully generated lookup table with 1/c and log(c)
values for small intervalls around 1. The log(c) is very near a double
precision value, it has about 62 bits precision. The algorithm is
log(2^k x) = k log(2) + log(c) + log(x/c), where the last term is
approximated by a polynomial of x/c - 1. Near 1 a single polynomial of
x - 1 is used.
There is separate code path when fma instruction is not available for
computing x/c - 1 precisely, in which case the table size is doubled.
The code uses __builtin_fma under __FP_FAST_FMA to ensure it is inlined
as an instruction.
With the default configuration settings the worst case error is 0.519 ULP
(and 0.520 without fma), the rodata size is 2192 bytes (4240 without fma).
The non-nearest rounding error is less than 1 ULP.
Improvements on Cortex-A72 compared to current glibc master:
log thruput: 3.28x in [0.01 11.1]
log latency: 2.23x in [0.01 11.1]
log thruput: 1.56x in [0.999 1.001]
log latency: 1.57x in [0.999 1.001]
Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA)
arm-linux-gnueabihf (!defined __FP_FAST_FMA)
x86_64-linux-gnu (!defined __FP_FAST_FMA)
powerpc64le-linux-gnu (defined __FP_FAST_FMA)
targets.
* NEWS: Mention log improvement.
* math/Makefile (type-double-routines): Add e_log_data.
* sysdeps/i386/fpu/e_log_data.c: New file.
* sysdeps/ia64/fpu/e_log_data.c: New file.
* sysdeps/ieee754/dbl-64/e_log.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_log_data.c: New file.
* sysdeps/ieee754/dbl-64/math_config.h (__log_data): Add.
* sysdeps/ieee754/dbl-64/ulog.h: Remove.
* sysdeps/ieee754/dbl-64/ulog.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_log_data.c: New file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2. There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.
The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.
The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case). Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.
The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.
New double precision error handling code was added following the
style of the single precision error handling code.
Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]
For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.
Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).
* NEWS: Mention exp and exp2 improvements.
* math/Makefile (libm-support): Remove t_exp.
(type-double-routines): Add math_err and e_exp_data.
* sysdeps/aarch64/libm-test-ulps: Update.
* sysdeps/arm/libm-test-ulps: Update.
* sysdeps/i386/fpu/e_exp_data.c: New file.
* sysdeps/i386/fpu/math_err.c: New file.
* sysdeps/i386/fpu/t_exp.c: Remove.
* sysdeps/ia64/fpu/e_exp_data.c: New file.
* sysdeps/ia64/fpu/math_err.c: New file.
* sysdeps/ia64/fpu/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
* sysdeps/ieee754/dbl-64/math_config.h: New file.
* sysdeps/ieee754/dbl-64/math_err.c: New file.
* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
<fenv_private.h> has inline versions of various <fenv.h> functions,
and their __fe* variants, for systems (generally soft-float) without
support for floating-point exceptions, rounding modes or both.
Having these inlines in a separate header introduces a risk of a
source file including <fenv.h> and compiling OK on x86_64, but failing
to compile (because the feraiseexcept inline is actually a macro that
discards its argument, to avoid the need for #ifdef FE_INVALID
conditionals), or not being properly optimized, on systems without the
exceptions and rounding modes support (when these inlines were in
math_private.h, we had a few cases where this broke the build because
there was no obvious reason for a file to need math_private.h and it
didn't need that header on x86_64). By moving those inlines to
include/fenv.h, this risk can be avoided, and fenv_private.h becomes
more clearly defined as specifically the header for the internal
libc_fe* and SET_RESTORE_ROUND* interfaces.
This patch makes that move, removing fenv_private.h includes that are
no longer needed (or replacing them by fenv.h includes in a few cases
that didn't already have such an include).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/generic/fenv_private.h [FE_ALL_EXCEPT == 0]: Move this
code ....
[!FE_HAVE_ROUNDING_MODES]: And this code ....
* include/fenv.h [!_ISOMAC]: ... to here.
* math/fraiseexcpt.c (__feraiseexcept): Undefine as macro.
(feraiseexcept): Likewise.
* math/fromfp.h: Do not include <fenv_private.h>.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* math/w_ilogb_template.c: Include <fenv.h> instead of
<fenv_private.h>.
* math/w_llogb_template.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the clean-up related to the catch-all math_private.h
header, this patch stops math_private.h from including fenv_private.h.
Instead, fenv_private.h is included directly from those users of
math_private.h that also used interfaces from fenv_private.h. No
attempt is made to remove unused includes of math_private.h, but that
is a natural followup.
(However, since math_private.h sometimes defines optimized versions of
math.h interfaces or __* variants thereof, as well as defining its own
interfaces, I think it might make sense to get all those optimized
versions included from include/math.h, not requiring a separate header
at all, before eliminating unused math_private.h includes - that
avoids a file quietly becoming less-optimized if someone adds a call
to one of those interfaces without restoring a math_private.h include
to that file.)
There is still a pitfall that if code uses plain fe* and __fe*
interfaces, but only includes fenv.h and not fenv_private.h or (before
this patch) math_private.h, it will compile on platforms with
exceptions and rounding modes but not get the optimized versions (and
possibly not compile) on platforms without exception and rounding mode
support, so making it easy to break the build for such platforms
accidentally.
I think it would be most natural to move the inlines / macros for fe*
and __fe* in the case of no exceptions and rounding modes into
include/fenv.h, so that all code including fenv.h with _ISOMAC not
defined automatically gets them. Then fenv_private.h would be purely
the header for the libc_fe*, SET_RESTORE_ROUND etc. internal
interfaces and the risk of breaking the build on other platforms than
the one you tested on because of a missing fenv_private.h include
would be much reduced (and there would be some unused fenv_private.h
includes to remove along with unused math_private.h includes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math_private.h: Do not include <fenv_private.h>.
* math/fromfp.h: Include <fenv_private.h>.
* math/math-narrow.h: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_ilogb_template.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_llogb_template.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_product.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise.
* sysdeps/ieee754/float128/float128_private.h: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove empty files due to the sin/cos improvements: k_sinf.c, k_cosf.c,
k_cos.c, k_sin.c. After the tanf change s_rem_pio2f.c and k_rem_pio2f.c
(and the ia64, m68k and powerpc equivalents) are no longer used,
so remove them. All e_rem_pio2.c files were already empty or commented
out, so remove them too. Passes build-many-glibcs.
* math/Makefile: Remove empty files k_sin(f).c, k_cos(f).c.
Remove unused files e_rem_pio2(f).c, k_rem_pio2f.c.
* sysdeps/i386/fpu/e_rem_pio2.c: Delete file.
* sysdeps/ia64/fpu/e_rem_pio2.c: Likewise.
* sysdeps/ia64/fpu/e_rem_pio2f.c: Likewise.
* sysdeps/ia64/fpu/k_rem_pio2f.c: Likewise.
* sysdeps/ieee754/dbl-64/e_rem_pio2.c: Likewise.
* sysdeps/ieee754/dbl-64/k_cos.c: Likewise.
* sysdeps/ieee754/dbl-64/k_sin.c: Likewise.
* sysdeps/ieee754/flt-32/e_rem_pio2f.c: Likewise.
* sysdeps/ieee754/flt-32/k_cosf.c: Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c: Likewise.
* sysdeps/ieee754/flt-32/k_sinf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/e_rem_pio2.c: Likewise
* sysdeps/m68k/m680x0/fpu/e_rem_pio2f.c: Likewise
* sysdeps/m68k/m680x0/fpu/k_rem_pio2f.c: Likewise
* sysdeps/powerpc/fpu/e_rem_pio2f.c: Likewise.
* sysdeps/powerpc/fpu/k_rem_pio2f.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Building with recent GCC mainline for i686-linux-gnu is failing with:
../sysdeps/ieee754/flt-32/k_rem_pio2f.c: In function '__kernel_rem_pio2f':
../sysdeps/ieee754/flt-32/k_rem_pio2f.c:186:28: error: 'fq[0]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
fv = math_narrow_eval (fq[0]-fv);
^
and
../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2':
../sysdeps/ieee754/dbl-64/k_rem_pio2.c:333:32: error: 'fq[0]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
fv = math_narrow_eval (fq[0] - fv);
^
These are similar to -Warray-bounds cases for which the DIAG_* macros
are already used in those files: the array element is in fact always
initialized, but the reasoning that it is depends on another array not
having been all zero at an earlier point, which depends on the
functions not being called with zero arguments. Thus, this patch uses
DIAG_* to disable -Wmaybe-uninitialized for this code.
(The warning may be i686-specific because of math_narrow_eval somehow
perturbing what the compiler does with this code enough to cause the
warning. I don't know why it doesn't appear for i686-gnu.)
Tested with build-many-glibcs.py that this fixes the i686 build in
this configuration.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Ignore
-Wmaybe-uninitialized around access to fq[0].
* sysdeps/ieee754/flt-32/k_rem_pio2f.c (__kernel_rem_pio2f):
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the narrowing divide functions from TS 18661-1 to
glibc's libm: fdiv, fdivl, ddivl, f32divf64, f32divf32x, f32xdivf64
for all configurations; f32divf64x, f32divf128, f64divf64x,
f64divf128, f32xdivf64x, f32xdivf128, f64xdivf128 for configurations
with _Float64x and _Float128; __nldbl_ddivl for ldbl-opt.
The changes are mostly essentially the same as for the other narrowing
functions, so the description of those generally applies to this patch
as well.
Tested for x86_64, x86, mips64 (all three ABIs, both hard and soft
float) and powerpc, and with build-many-glibcs.py.
* math/Makefile (libm-narrow-fns): Add div.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing divide functions.
* math/bits/mathcalls-narrow.h (div): Use __MATHCALL_NARROW.
* math/gen-auto-libm-tests.c (test_functions): Add div.
* math/math-narrow.h (CHECK_NARROW_DIV): New macro.
(NARROW_DIV_ROUND_TO_ODD): Likewise.
(NARROW_DIV_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__fdivl): New
macro.
(__ddivl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fdiv and
ddiv.
(CFLAGS-nldbl-ddiv.c): New variable.
(CFLAGS-nldbl-fdiv.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_ddivl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_ddivl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fdiv, fdivl,
ddivl, fMdivfN, fMdivfNx, fMxdivfN and fMxdivfNx.
* math/auto-libm-test-in: Add tests of div.
* math/auto-libm-test-out-narrow-div: New generated file.
* math/libm-test-narrow-div.inc: New file.
* sysdeps/i386/fpu/s_f32xdivf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xdivf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fdiv.c: Likewise.
* sysdeps/ieee754/float128/s_f32divf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64divf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xdivf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_ddivl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xdivf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fdivl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_ddivl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fdivl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_ddivl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fdivl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-ddiv.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fdiv.c: Likewise.
* sysdeps/ieee754/soft-fp/s_ddivl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fdiv.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fdivl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the narrowing multiply functions from TS 18661-1 to
glibc's libm: fmul, fmull, dmull, f32mulf64, f32mulf32x, f32xmulf64
for all configurations; f32mulf64x, f32mulf128, f64mulf64x,
f64mulf128, f32xmulf64x, f32xmulf128, f64xmulf128 for configurations
with _Float64x and _Float128; __nldbl_dmull for ldbl-opt.
The changes are mostly essentially the same as for the narrowing add
functions, so the description of those generally applies to this patch
as well. f32xmulf64 for i386 cannot use precision control as used for
add and subtract, because that would result in double rounding for
subnormal results, so that uses round-to-odd with long double
intermediate result instead. The soft-fp support involves adding a
new FP_TRUNC_COOKED since soft-fp multiplication uses cooked inputs
and outputs.
Tested for x86_64, x86, mips64 (all three ABIs, both hard and soft
float) and powerpc, and with build-many-glibcs.py.
* math/Makefile (libm-narrow-fns): Add mul.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing multiply functions.
* math/bits/mathcalls-narrow.h (mul): Use __MATHCALL_NARROW.
* math/gen-auto-libm-tests.c (test_functions): Add mul.
* math/math-narrow.h (CHECK_NARROW_MUL): New macro.
(NARROW_MUL_ROUND_TO_ODD): Likewise.
(NARROW_MUL_TRIVIAL): Likewise.
* soft-fp/op-common.h (FP_TRUNC_COOKED): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__fmull): New
macro.
(__dmull): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fmul and
dmul.
(CFLAGS-nldbl-dmul.c): New variable.
(CFLAGS-nldbl-fmul.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_dmull.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_dmull): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fmul, fmull,
dmull, fMmulfN, fMmulfNx, fMxmulfN and fMxmulfNx.
* math/auto-libm-test-in: Add tests of mul.
* math/auto-libm-test-out-narrow-mul: New generated file.
* math/libm-test-narrow-mul.inc: New file.
* sysdeps/i386/fpu/s_f32xmulf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xmulf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmul.c: Likewise.
* sysdeps/ieee754/float128/s_f32mulf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64mulf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xmulf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_dmull.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xmulf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmull.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_dmull.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmull.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_dmull.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmull.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dmul.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fmul.c: Likewise.
* sysdeps/ieee754/soft-fp/s_dmull.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fmul.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fmull.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch continues the math_private.h cleanup by stopping
math_private.h from including math-barriers.h and making the users of
the barrier macros include the latter header directly. No attempt is
made to remove any math_private.h includes that are now unused, except
in strtod_l.c where that is done to avoid line number changes in
assertions, so that installed stripped shared libraries can be
compared before and after the patch. (I think the floating-point
environment support in math_private.h should also move out - some
architectures already have fenv_private.h as an architecture-internal
header included from their math_private.h - and after moving that out
might be a better time to identify unused math_private.h includes.)
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/generic/math_private.h: Do not include
<math-barriers.h>.
* stdlib/strtod_l.c: Include <math-barriers.h> instead of
<math_private.h>.
* math/fromfp.h: Include <math-barriers.h>.
* math/math-narrow.h: Likewise.
* math/s_nextafter.c: Likewise.
* math/s_nexttowardf.c: Likewise.
* sysdeps/aarch64/fpu/s_llrint.c: Likewise.
* sysdeps/aarch64/fpu/s_llrintf.c: Likewise.
* sysdeps/aarch64/fpu/s_lrint.c: Likewise.
* sysdeps/aarch64/fpu/s_lrintf.c: Likewise.
* sysdeps/i386/fpu/s_nextafterl.c: Likewise.
* sysdeps/i386/fpu/s_nexttoward.c: Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j0.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
* sysdeps/ieee754/flt-32/e_j0f.c: Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nextafterf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_j0l.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch continues cleaning up math_private.h by moving the
math_check_force_underflow set of macros to a separate header
math-underflow.h.
This header is included by the files that need it rather than from
math_private.h. Moving these macros to a separate file removes the
math_private.h uses of macros from float.h, so the inclusion of
float.h in math_private.h is also removed; files that were depending
on that inclusion are fixed to include float.h directly. The
inclusion of math-barriers.h from math_private.h will be removed in a
separate patch.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* math/math-underflow.h: New file.
* sysdeps/generic/math_private.h: Do not include <float.h>.
(fabs_tg): Remove macro. Moved to math-underflow.h.
(min_of_type_f): Likewise.
(min_of_type_): Likewise.
(min_of_type_l): Likewise.
(min_of_type_f128): Likewise.
(min_of_type): Likewise.
(math_check_force_underflow): Likewise.
(math_check_force_underflow_nonneg): Likewise.
(math_check_force_underflow_complex): Likewise.
* math/e_exp2_template.c: Include <math-underflow.h>.
* math/k_casinh_template.c: Likewise.
* math/s_catan_template.c: Likewise.
* math/s_catanh_template.c: Likewise.
* math/s_ccosh_template.c: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_clog10_template.c: Likewise.
* math/s_clog_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_csqrt_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* sysdeps/ieee754/dbl-64/e_asin.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_hypot.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tanh.c: Likewise.
* sysdeps/ieee754/flt-32/e_asinf.c: Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
* sysdeps/ieee754/flt-32/k_sinf.c: Likewise.
* sysdeps/ieee754/flt-32/k_tanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c: Likewise.
* sysdeps/ieee754/flt-32/s_atanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
* sysdeps/ieee754/flt-32/s_tanhf.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_log1pl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Likewise.
* sysdeps/powerpc/fpu/e_hypot.c: Likewise.
* sysdeps/x86/fpu/powl_helper.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nextup.c: Include <float.h>.
* sysdeps/ieee754/flt-32/s_nextupf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nextupl.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch continues cleaning up the math_private.h header, which
contains lots of different definitions many of which are only needed
by a limited subset of files using that header (and some of which are
overridden by architectures that only want to override selected parts
of the header), by moving the math_narrow_eval macro out to a separate
math-narrow-eval.h header, only included by those files that need it.
That header is placed in include/ (since it's used in stdlib/, not
just files built in math/, but no sysdeps variants are needed at
present).
Tested for x86_64, and with build-many-glibcs.py. (Installed stripped
shared libraries change because of line numbers in assertions in
strtod_l.c.)
* include/math-narrow-eval.h: New file. Contents moved from ....
* sysdeps/generic/math_private.h: ... here.
(math_narrow_eval): Remove macro. Moved to math-narrow-eval.h.
[FLT_EVAL_METHOD != 0] (excess_precision): Likewise.
* math/s_fdim_template.c: Include <math-narrow-eval.h>.
* stdlib/strtod_l.c: Likewise.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/i386/fpu/s_fdim.c: Likewise.
* sysdeps/ieee754/dbl-64/e_cosh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_productf.c: Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/flt-32/e_coshf.c: Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c: Likewise.
* sysdeps/ieee754/flt-32/e_expf.c: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_product.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a fast path to e_exp.c when |x| < 1.03972053527832.
When values are tested in isolation, reduction in execution
time is: aarch 30%, sparc 18%, x86 37%.
When comparing benchtests/bench.out which includes values
outside that range, the gains are:
aarch 8%, sparc 5%, x86 9%.
make check is clean (no increase in ulp for any math test).
Testing 20M values for each rounding mode in that range shows
approximately one in 200 values is off by 1 ulp. No value tested
for exp(x) changed by 2 or more ulp.
No observed change in performance or accuracy for x outside
fast path range.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the sincos implementation - rather than rely on odd partial inlining
of preprocessed portions from sin and cos, explicitly write out the cases.
This makes sincos much easier to maintain and provides an additional 16-20%
speedup between 0 and 2^27. The overall speedup of sincos is 48% over this range.
Between 0 and PI it is 66% faster.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same
logic as sin and cos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor duplicated code into do_sin. Since all calls to do_sin use copysign to
set the sign of the result, move it inside do_sin. Small inputs use a separate
polynomial, so move this into do_sin as well (the check is based on the more
conservative case when doing large range reduction, but could be relaxed).
* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small
inputs. Return correct sign.
(do_sincos): Remove small input check before do_sin, let do_sin set
the sign.
(__sin): Likewise.
(__cos): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove all unused slowpath functions.
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SLOW): Remove.
(do_cos_slow): Likewise.
(do_sin_slow): Likewise.
(reduce_and_compute): Likewise.
(slow): Likewise.
(slow1): Likewise.
(slow2): Likewise.
(sloww): Likewise.
(sloww1): Likewise.
(sloww2): Likewise.
(bslow): Likewise.
(bslow1): Likewise.
(bslow2): Likewise.
(cslow2): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For huge inputs use the improved do_sincos function as well. Now no cases use
the correction factor returned by do_sin, do_cos and TAYLOR_SIN, so remove it.
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter.
(do_cos): Remove corp parameter and calculations.
(do_sin): Likewise.
(do_sincos): Remove cor variable.
(__sin): Use do_sincos for huge inputs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
(reduce_and_compute_sincos): Remove unused function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the accuracy of the range reduction. When the input is
large (2^27) and very close to a multiple of PI/2, using 110 bits of PI is not
enough. Improve range reduction accuracy to 136 bits. As a result the special
checks for results close to zero can be removed. The ULP of the polynomials is
at worst 0.55ULP, so there is no reason for the slow functions, and they can be
removed.
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to
reduce_sincos, improve accuracy to 136 bits.
(do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions.
(__sin): Use improved reduction and simplified do_sincos calculation.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the large range reduction code and defers to the huge range
reduction code. The first level range reducer supports inputs up to 2^27,
which is way too large given that inputs for sin/cos are typically small
(< 10), and optimizing for a smaller range would give a significant speedup.
Input values above 2^27 are practically never used, so there is no reason for
supporting range reduction between 2^27 and 2^48. Removing it significantly
simplifies code and enables further speedups. There is about a 2.3x slowdown
in this range due to __branred being extremely slow (a better algorithm could
easily more than double performance).
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
(do_sincos_2): Likewise.
(__sin): Remove middle range reduction case.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
reduction case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).
ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos. The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.
Tested on AArch64 and x86_64 with no regressions.
The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction. Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
inputs.
(__cos): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the narrowing subtract functions from TS 18661-1 to
glibc's libm: fsub, fsubl, dsubl, f32subf64, f32subf32x, f32xsubf64
for all configurations; f32subf64x, f32subf128, f64subf64x,
f64subf128, f32xsubf64x, f32xsubf128, f64xsubf128 for configurations
with _Float64x and _Float128; __nldbl_dsubl for ldbl-opt.
The changes are essentially the same as for the narrowing add
functions, so the description of those generally applies to this patch
as well.
Tested for x86_64, x86, mips64 (all three ABIs, both hard and soft
float) and powerpc, and with build-many-glibcs.py.
* math/Makefile (libm-narrow-fns): Add sub.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing subtract functions.
* math/bits/mathcalls-narrow.h (sub): Use __MATHCALL_NARROW.
* math/gen-auto-libm-tests.c (test_functions): Add sub.
* math/math-narrow.h (CHECK_NARROW_SUB): New macro.
(NARROW_SUB_ROUND_TO_ODD): Likewise.
(NARROW_SUB_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__fsubl): New
macro.
(__dsubl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fsub and
dsub.
(CFLAGS-nldbl-dsub.c): New variable.
(CFLAGS-nldbl-fsub.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_dsubl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_dsubl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fsub, fsubl,
dsubl, fMsubfN, fMsubfNx, fMxsubfN and fMxsubfNx.
* math/auto-libm-test-in: Add tests of sub.
* math/auto-libm-test-out-narrow-sub: New generated file.
* math/libm-test-narrow-sub.inc: New file.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xsubf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fsub.c: Likewise.
* sysdeps/ieee754/float128/s_f32subf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64subf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xsubf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_dsubl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xsubf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fsubl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_dsubl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fsubl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_dsubl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fsubl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dsub.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fsub.c: Likewise.
* sysdeps/ieee754/soft-fp/s_dsubl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fsub.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fsubl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use sqrt(f/l) to enable inlining by GCC - if inlining doesn't happen,
the asm redirect ensures we will still call __ieee754_sqrt(f/l).
* sysdeps/ieee754/dbl-64/e_acosh.c (__ieee754_acosh): Use sqrt.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Likewise.
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Likewise.
* sysdeps/ieee754/dbl-64/e_j0.c (__ieee754_j0): Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c (__ieee754_j1): Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/e_acosh.c (__ieee754_acosh): Likewise.
* sysdeps/ieee754/flt-32/e_acosf.c (__ieee754_acosf): Likewise.
* sysdeps/ieee754/flt-32/e_acoshf.c (__ieee754_acoshf): Likewise.
* sysdeps/ieee754/flt-32/e_asinf.c (__ieee754_asinf): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/flt-32/e_hypotf.c (__ieee754_hypotf): Likewise.
* sysdeps/ieee754/flt-32/e_j0f.c (__ieee754_j0f): Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c (__ieee754_j1f): Likewise.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise.
* sysdeps/ieee754/ldbl-128/e_acoshl.c (__ieee754_acoshl): Use sqrtl.
* sysdeps/ieee754/ldbl-128/e_acosl.c (__ieee754_acosl): Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/ieee754/ldbl-128/e_j0l.c (__ieee754_j0l): Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c (__ieee754_j1l): Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128/s_asinhl.c (__ieee754_asinhl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_acoshl.c (__ieee754_acoshl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_acosl.c (__ieee754_acosl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j0l.c (__ieee754_j0l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c (__ieee754_j1l): Likewise
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__ieee754_asinhl): Likewise.
* sysdeps/ieee754/ldbl-96/e_acoshl.c (__ieee754_acoshl): Use sqrtl.
* sysdeps/ieee754/ldbl-96/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/ieee754/ldbl-96/e_j0l.c (__ieee754_j0l): Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c (__ieee754_j1l): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-96/s_asinhl.c (__ieee754_asinhl): Likewise.
* sysdeps/m68k/m680x0/fpu/e_pow.c (__ieee754_pow): Likewise.
* sysdeps/powerpc/fpu/e_hypot.c (__ieee754_hypot): Likewise.
* sysdeps/powerpc/fpu/e_hypotf.c (__ieee754_hypotf): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the now unused mplog and mpexp files.
* math/Makefile: Remove mpexp.c and mplog.c
* sysdeps/i386/fpu/mpexp.c: Delete file.
* sysdeps/i386/fpu/mplog.c: Likewise.
* sysdeps/ia64/fpu/mpexp.c: Likewise.
* sysdeps/ia64/fpu/mplog.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog.
* sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function.
* sysdeps/ieee754/dbl-64/mpexp.c: Delete file.
* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
* sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/mplog.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*.
* sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines.
* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the __slowexp code, so exp is no longer correctly rounded. The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.
* manual/probes.texi: Remove slowexp probes.
* math/Makefile: Remove slowexp.
* sysdeps/generic/math_private.h (__slowexp): Remove.
* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
document error bounds.
* sysdeps/i386/fpu/slowexp.c: Remove.
* sysdeps/ia64/fpu/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the narrowing add functions from TS 18661-1 to glibc's
libm: fadd, faddl, daddl, f32addf64, f32addf32x, f32xaddf64 for all
configurations; f32addf64x, f32addf128, f64addf64x, f64addf128,
f32xaddf64x, f32xaddf128, f64xaddf128 for configurations with
_Float64x and _Float128; __nldbl_daddl for ldbl-opt. As discussed for
the build infrastructure patch, tgmath.h support is deliberately
deferred, and FP_FAST_* macros are not applicable without optimized
function implementations.
Function implementations are added for all relevant pairs of formats
(including certain cases of a format and itself where more than one
type has that format). The main implementations use round-to-odd, or
a trivial computation in the case where both formats are the same or
where the wider format is IBM long double (in which case we don't
attempt to be correctly rounding). The sysdeps/ieee754/soft-fp
implementations use soft-fp, and are used automatically for
configurations without exceptions and rounding modes by virtue of
existing Implies files. As previously discussed, optimized versions
for particular architectures are possible, but not included.
i386 gets a special version of f32xaddf64 to avoid problems with
double rounding (similar to the existing fdim version), since this
function must round just once without an intermediate rounding to long
double. (No such special version is needed for any other function,
because the nontrivial functions use round-to-odd, which does the
intermediate computation with the rounding mode set to round-to-zero,
and double rounding is OK except in round-to-nearest mode, so is OK
for that intermediate round-to-zero computation.) mul and div will
need slightly different special versions for i386 (using round-to-odd
on long double instead of precision control) because of the
possibility of inexact intermediate results in the subnormal range for
double.
To reduce duplication among the different function implementations,
math-narrow.h gets macros CHECK_NARROW_ADD, NARROW_ADD_ROUND_TO_ODD
and NARROW_ADD_TRIVIAL.
In the trivial cases and for any architecture-specific optimized
implementations, the overhead of the errno setting might be
significant, but I think that's best handled through compiler built-in
functions rather than providing separate no-errno versions in glibc
(and likewise there are no __*_finite entry points for these function
provided, __*_finite effectively being no-errno versions at present in
most cases).
Tested for x86_64 and x86, with both GCC 6 and GCC 7. Tested for
mips64 (all three ABIs, both hard and soft float) and powerpc with GCC
7. Tested with build-many-glibcs.py with both GCC 6 and GCC 7.
* math/Makefile (libm-narrow-fns): Add add.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing add functions.
* math/bits/mathcalls-narrow.h (add): Use __MATHCALL_NARROW .
* math/gen-auto-libm-tests.c (test_functions): Add add.
* math/math-narrow.h (CHECK_NARROW_ADD): New macro.
(NARROW_ADD_ROUND_TO_ODD): Likewise.
(NARROW_ADD_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__faddl): New
macro.
(__daddl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fadd and
dadd.
(CFLAGS-nldbl-dadd.c): New variable.
(CFLAGS-nldbl-fadd.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_daddl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_daddl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fadd, faddl,
daddl, fMaddfN, fMaddfNx, fMxaddfN and fMxaddfNx.
* math/auto-libm-test-in: Add tests of add.
* math/auto-libm-test-out-narrow-add: New generated file.
* math/libm-test-narrow-add.inc: New file.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fadd.c: Likewise.
* sysdeps/ieee754/float128/s_f32addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dadd.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_daddl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_faddl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the slow paths from log. Like several other double precision math
functions, log is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
Interestingly removing the slow paths makes hardly any difference in practice:
the worst case error is still ~0.502ULP, and exp(log(x)) shows identical results
before/after on many millions of random cases. All GLIBC math tests pass on
AArch64 and x64 with no change in ULP error. A simple test over a few hundred
million values shows log is now 18% faster on average.
* manual/probes.texi (slowlog): Delete documentation of removed probe.
(slowlog_inexact): Likewise
* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Remove slow paths.
* sysdeps/ieee754/dbl-64/ulog.h: Remove unused declarations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The general rule in glibc is that it's better for a macro to be always
defined, and tested with #if, than for it to be tested with #ifdef,
because the latter is prone to typos in the macro name as well as to
the header with the macro accidentally not being included in a file
testing it. (Testing with an "if" statement is even better, in those
cases where it's possible to do things that way, as it then means both
cases in the code get checked for syntax in glibc builds with either
value of the condition.)
math_private.h has several different groups of macros, meaning that
architectures wanting to override some of them need to define those
then include the generic version, which then defines macros if not
already defined. It's hard to avoid that arrangement completely, but
various cases can be improved by splitting out macros or groups of
macros into separate files.
This patch splits out the LDBL_CLASSIFY_COMPAT macro into a separate
ldbl-classify-compat.h header. This macro is tested with #ifdef; this
patch changes it to testing with #if, with a default definition to 0
in the generic header and then architecture-specific headers defining
it to 1.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch.
* sysdeps/generic/ldbl-classify-compat.h: New file.
* sysdeps/arm/ldbl-classify-compat.h: Likewise.
* sysdeps/m68k/coldfire/ldbl-classify-compat.h: Likewise.
* sysdeps/microblaze/ldbl-classify-compat.h: Likewise.
* sysdeps/mips/ldbl-classify-compat.h: Likewise.
* sysdeps/nios2/ldbl-classify-compat.h: Likewise.
* sysdeps/sh/ldbl-classify-compat.h: Likewise.
* sysdeps/ieee754/dbl-64/s_finite.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/ieee754/dbl-64/s_isinf.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/ieee754/dbl-64/s_isnan.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/ieee754/dbl-64/wordsize-64/s_finite.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/ieee754/dbl-64/wordsize-64/s_isinf.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/ieee754/dbl-64/wordsize-64/s_isnan.c: Include
<ldbl-classify-compat.h>.
[LDBL_CLASSIFY_COMPAT]: Test value, not whether defined.
* sysdeps/arm/math_private.h (LDBL_CLASSIFY_COMPAT): Remove macro.
* sysdeps/mips/math_private.h (LDBL_CLASSIFY_COMPAT): Likewise.
* sysdeps/m68k/coldfire/math_private.h: Remove file.
* sysdeps/microblaze/math_private.h: Likewise.
* sysdeps/nios2/math_private.h: Likewise.
* sysdeps/sh/math_private.h: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As reported in bug 21314, building log1p and log1pf fails with -Os
because of a spurious -Wmaybe-uninitialized warning (reported there
for GCC 5 for MIPS, I see it also with GCC 7 for x86_64). This patch,
based on the patches in the bug, fixes this using the DIAG_* macros.
Tested for x86_64 with -Os that this eliminates those warnings and so
allows the build to progress further.
2018-02-01 Carlos O'Donell <carlos@redhat.com>
Ramin Seyed-Moussavi <lordrasmus@gmail.com>
Joseph Myers <joseph@codesourcery.com>
[BZ #21314]
* sysdeps/ieee754/dbl-64/s_log1p.c: Include <libc-diag.h>.
(__log1p): Disable -Wmaybe-uninitialized for -Os around
computation using c.
* sysdeps/ieee754/flt-32/s_log1pf.c: Include <libc-diag.h>.
(__log1pf): Disable -Wmaybe-uninitialized for -Os around
computation using c.
|
|
|
|
|
|
|
| |
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in bug 21309, dbl-64/e_pow.c contains signed int shifts that,
although the shift count is in the range [0, 31], shift bits into and
beyond the sign bit and so are undefined in ISO C. Although this is
defined in GNU C, this patch from the bug cleans up the code to avoid
those shifts.
Tested for x86_64.
[BZ #21309]
* sysdeps/ieee754/dbl-64/e_pow.c (checkint): Make m and n
unsigned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert:
2017-12-19 Joseph Myers <joseph@codesourcery.com>
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com>
* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
<errno.h>. Include "eexp.tbl".
(half): New constant.
(one): Likewise.
(__ieee754_exp): Rewrite.
(__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
* sysdeps/i386/fpu/slowexp.c: Likewise.
* sysdeps/ia64/fpu/slowexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
comment.
* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
(CPPFLAGS-slowexp.c): Remove variable.
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
(CFLAGS-slowexp-fma.c): Remove variable.
(CFLAGS-slowexp-fma4.c): Likewise.
(CFLAGS-slowexp-avx.c): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
define as macro.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
* math/Makefile (type-double-routines): Remove slowexp.
* manual/probes.texi (slowexp_p6): Remove.
(slowexp_p32): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.
Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).
Using the glibc perf tests on sparc,
sparc (nsec) x86 (nsec)
old new old new
max 17629 395 5173 144
min 399 54 15 13
mean 5317 200 1349 23
The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.
Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0) (-1.46113312244415283203125)
new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3
old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3
exp = 2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp
In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes. For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures. I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
<errno.h>. Include "eexp.tbl".
(half): New constant.
(one): Likewise.
(__ieee754_exp): Rewrite.
(__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
* sysdeps/i386/fpu/slowexp.c: Likewise.
* sysdeps/ia64/fpu/slowexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
comment.
* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
(CPPFLAGS-slowexp.c): Remove variable.
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
(CFLAGS-slowexp-fma.c): Remove variable.
(CFLAGS-slowexp-fma4.c): Likewise.
(CFLAGS-slowexp-avx.c): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
define as macro.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
* math/Makefile (type-double-routines): Remove slowexp.
* manual/probes.texi (slowexp_p6): Remove.
(slowexp_p32): Likewise.
|