| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The math_private.h macro min_of_type has broken _Float128 handling:
instead of passing its type argument to the key __EXPR_FLT128 macro,
it passes x, which is not a macro argument but whatever variable
called x happens to be visible in the calling function. If that
variable has the wrong type, the wrong one of long double and
_Float128 can get chosen. In particular, this applies to some
_Complex long double functions (where x happens to have type _Complex
long double, resulting in min_of_type returning a _Float128 value when
it should return a long double value). For some reason, this only
caused test failures for me on x86_64 with GCC 6 but not GCC 7 (I
suspect it triggers known bugs in conversions from x86 long double to
_Float128 that are present in GCC 6's soft-fp).
Tested for x86_64 (in conjunction with float128 patches).
* sysdeps/generic/math_private.h (__EXPR_FLT128): Do not apply
typeof to argument passed to __builtin_types_compatible_p.
(min_of_type): Pass type argument, not x, to __EXPR_FLT128.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the necessary bits to the private headers to support
building the _Float128 libm functions.
A local override for float.h is provided to include the
missing *FLT128 macros implied by TS 18661-3 for this
type when compiling prior to GCC 7.
* include/complex.h (__kernel_casinhf128): New declaration.
* include/float.h: New file.
* include/math.h (__finitef128): Add a hidden def.
(__isinff128): Likewise.
(__isnanf128): Likewise.
(__fpclassify): Likewise.
(__issignalling): Likewise.
(__expf128): Likewise.
(__expm1f128): Likewise.
* sysdeps/generic/fix-fp-int-convert-overflow.h:
(FIX_FLT128_LONG_CONVERT_OVERFLOW): New macro.
(FIX_FLT128_LLONG_CONVERT_OVERFLOW): Likewise.
* sysdeps/generic/math-type-macros-float128.h: New file.
* sysdeps/generic/math_private.h: Include bits/floatn.h and
math_private_calls.h for _Float128.
(__isinff128): New inline implementation used when GCC < 7.0,
since in this case __builtin_isinf_sign is broken.
(fabsf128): New inline implementation that calls the builtin.
(__EXPR_FLT128): New macro.
(min_of_type): Optionally include _Float128 types too.
* sysdeps/generic/math_private_calls.h (__kernel_sincos):
Declare for _Float128.
(__kernel_rem_pio2): Likewise.
* sysdeps/ieee754/ldbl-opt/s_sin.c:
(__DECL_SIMD_sincos_disablef128): New macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This provides a extra macro expansion before invoking
the hidden_def macro. This is necessary to build the
ldbl-128 files as float128 correctly.
* sysdeps/generic/math_private.h:
(mathx_hidden_def): New macro.
* sysdeps/ieee754/ldbl-128/s_finitel.c: Replace hidden_def with
the above.
* sysdeps/ieee754/ldbl-128/s_isinfl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_isnanl.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves the declaration of many floating-point functions from
math_private.h to math_private_calls.h and macroize the declaration to
be dependent on floating-point type. For each of float, double, and
long double, the new header file is included once. This reduces the
amount of repetitive boilerplate that will be required when adding
float128 versions of these functions.
Tested for powerpc64le and s390x.
* sysdeps/generic/math_private.h: Move the declaration of many
functions to sysdeps/generic/math_private_calls.h.
* sysdeps/generic/math_private_calls.h: New file with the
declarations of the functions removed from math_private.h
macroized by floating-point type.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The declarations of many functions in math_private.h are not required
since __MATHDECL and __MATHDECLX, in math.h, already provide the
declarations for these functions. This patch removes the declarations
from math_private.h. It also adds the inclusion of math.h to the files
which depended on the declaration of functions in math_private.h.
Tested for powerpc64le and s390x.
* sysdeps/generic/math_private.h: Remove declarations of
many functions that are already declared in math.h.
* sysdeps/ieee754/ldbl-128/e_logl.c: Include math.h to get the
declaration for __frexpl.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c: Include math.h to get
the declarations for __scalbnl and fabsl.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The implementation of __ieee754_rem_pio2l in ldbl-128, ldbl-128ibm,
and ldbl-96 return the type int32_t, whereas math_private.h declares
it as returning int. This patch changes the declaration to match the
declaration in thoses directories, as well as it changes the stub
implementation in math/e_rem_pio2l.c, similarly.
* math/e_rem_pio2l.c (__ieee754_rem_pio2l): Change return type
to int32_t.
* sysdeps/generic/math_private.h: Declare __ieee754_rem_pio2l
as returning int32_t.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few 'long double'-related tests include math_private.h just for
their variety of math_ldbl.h, which contains macros for assembling and
disassembling the binary representation of 'long double'. math_ldbl.h
insists on being included from math_private.h, but if we relax this
restriction (and fix some portability sloppiness) we can use it
directly and not have to expose all of math_private.h to the testsuite.
* sysdeps/generic/math_private.h: Use __BIG_ENDIAN and
__LITTLE_ENDIAN, not BIG_ENDIAN and LITTLE_ENDIAN.
* sysdeps/generic/math_ldbl.h
* sysdeps/ia64/fpu/math_ldbl.h
* sysdeps/ieee754/ldbl-128/math_ldbl.h
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h
* sysdeps/ieee754/ldbl-96/math_ldbl.h
* sysdeps/powerpc/fpu/math_ldbl.h
* sysdeps/x86_64/fpu/math_ldbl.h:
Allow direct inclusion. Use uintNN_t instead of u_intNN_t.
Use __BIG_ENDIAN and __LITTLE_ENDIAN, not BIG_ENDIAN and
LITTLE_ENDIAN. Include endian.h and/or stdint.h if necessary.
Add copyright notices.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h (ldbl_canonicalize_int):
Don't use EXTRACT_WORDS64.
* sysdeps/ieee754/ldbl-96/test-canonical-ldbl-96.c
* sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c
* sysdeps/ieee754/ldbl-128ibm/test-canonical-ldbl-128ibm.c
* sysdeps/ieee754/ldbl-128ibm/test-totalorderl-ldbl-128ibm.c:
Include math_ldbl.h, not math_private.h.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors some type-generic libm macros, in both math.h and
math_private.h, to be based on a common __MATH_TG macro rather than
all replicating similar logic to choose a function to call based on
the type of the argument.
This should serve to illustrate what I think float128 support for such
macros should look like: common macros such as __MATH_TG may need
different definitions depending on whether float128 is supported in
glibc, so that the individual macros themselves do not need
conditionals on float128 support.
Tested for x86_64, x86, mips64 and powerpc.
* math/math.h (__MATH_TG): New macro.
[__USE_ISOC99] (fpclassify): Define using __MATH_TG.
[__USE_ISOC99] (signbit): Likewise.
[__USE_ISOC99] (isfinite): Likewise.
[__USE_ISOC99] (isnan): Likewise.
[__USE_ISOC99] (isinf): Likewise.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (issignaling): Likewise.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (__MATH_EVAL_FMT2): New macro.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (iseqsig): Define using
__MATH_TG and __MATH_EVAL_FMT2.
* sysdeps/generic/math_private.h (fabs_tg): Define using
__MATH_TG.
* sysdeps/ieee754/ldbl-128ibm/bits/iscanonical.h
[!__NO_LONG_DOUBLE_MATH] (__iscanonicalf): New macro.
[!__NO_LONG_DOUBLE_MATH] (__iscanonical): Likewise.
[!__NO_LONG_DOUBLE_MATH] (iscanonical): Define using __MATH_TG.
* sysdeps/ieee754/ldbl-96/bits/iscanonical.h (__iscanonicalf): New
macro.
(__iscanonical): Likewise.
(iscanonical): Define using __MATH_TG.
|
|
|
|
|
|
| |
Use the GCC builtin instead. With the exception of the
files built from a template, they are unused. This
is preparation for making the s_nanF objects generated.
|
|
|
|
|
|
|
|
|
|
| |
This is only used for the float and double variants.
Instead, just add it to the type specific list of files,
and remove all stubs, and remove the declaration from
math_private.h.
I verified x86_64, i486, ia64, m68k, and ppc64 build.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.
However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm. This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs. This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.
Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear. Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various floating-point functions have code to force underflow
exceptions if a tiny result was computed in a way that might not have
resulted in such exceptions even though the result is inexact. This
typically uses math_force_eval to ensure that the underflowing
expression is evaluated, but sometimes uses volatile.
This patch refactors such code to use three new macros
math_check_force_underflow, math_check_force_underflow_nonneg and
math_check_force_underflow_complex (which in turn use
math_force_eval). In the limited number of cases not suited to a
simple conversion to these macros, existing uses of volatile are
changed to use math_force_eval instead. The converted code does not
always execute exactly the same sequence of operations as the original
code, but the overall effects should be the same.
Tested for x86_64, x86, mips64 and powerpc.
* sysdeps/generic/math_private.h (fabs_tg): New macro.
(min_of_type): Likewise.
(math_check_force_underflow): Likewise.
(math_check_force_underflow_nonneg): Likewise.
(math_check_force_underflow_complex): Likewise.
* math/e_exp2l.c (__ieee754_exp2l): Use
math_check_force_underflow_nonneg.
* math/k_casinh.c (__kernel_casinh): Likewise.
* math/k_casinhf.c (__kernel_casinhf): Likewise.
* math/k_casinhl.c (__kernel_casinhl): Likewise.
* math/s_catan.c (__catan): Use
math_check_force_underflow_complex.
* math/s_catanf.c (__catanf): Likewise.
* math/s_catanh.c (__catanh): Likewise.
* math/s_catanhf.c (__catanhf): Likewise.
* math/s_catanhl.c (__catanhl): Likewise.
* math/s_catanl.c (__catanl): Likewise.
* math/s_ccosh.c (__ccosh): Likewise.
* math/s_ccoshf.c (__ccoshf): Likewise.
* math/s_ccoshl.c (__ccoshl): Likewise.
* math/s_cexp.c (__cexp): Likewise.
* math/s_cexpf.c (__cexpf): Likewise.
* math/s_cexpl.c (__cexpl): Likewise.
* math/s_clog.c (__clog): Use math_check_force_underflow_nonneg.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/s_csin.c (__csin): Use math_check_force_underflow_complex.
* math/s_csinf.c (__csinf): Likewise.
* math/s_csinh.c (__csinh): Likewise.
* math/s_csinhf.c (__csinhf): Likewise.
* math/s_csinhl.c (__csinhl): Likewise.
* math/s_csinl.c (__csinl): Likewise.
* math/s_csqrt.c (__csqrt): Use math_check_force_underflow.
* math/s_csqrtf.c (__csqrtf): Likewise.
* math/s_csqrtl.c (__csqrtl): Likewise.
* math/s_ctan.c (__ctan): Use math_check_force_underflow_complex.
* math/s_ctanf.c (__ctanf): Likewise.
* math/s_ctanh.c (__ctanh): Likewise.
* math/s_ctanhf.c (__ctanhf): Likewise.
* math/s_ctanhl.c (__ctanhl): Likewise.
* math/s_ctanl.c (__ctanl): Likewise.
* stdlib/strtod_l.c (round_and_return): Use math_force_eval
instead of volatile.
* sysdeps/ieee754/dbl-64/e_asin.c (__ieee754_asin): Use
math_check_force_underflow.
* sysdeps/ieee754/dbl-64/e_atanh.c (__ieee754_atanh): Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Do not use
volatile when forcing underflow.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c (__ieee754_j1): Use
math_check_force_underflow.
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c (__ieee754_sinh): Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c (atan): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/dbl-64/s_erf.c (__erf): Use
math_check_force_underflow.
* sysdeps/ieee754/dbl-64/s_expm1.c (__expm1): Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c (__fma): Use math_force_eval
instead of volatile.
* sysdeps/ieee754/dbl-64/s_log1p.c (__log1p): Use
math_check_force_underflow.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c (tan): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/dbl-64/s_tanh.c (__tanh): Use
math_check_force_underflow.
* sysdeps/ieee754/flt-32/e_asinf.c (__ieee754_asinf): Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c (__ieee754_atanhf): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c (__ieee754_j1f): Use
math_check_force_underflow.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c (__ieee754_sinhf): Likewise.
* sysdeps/ieee754/flt-32/k_sinf.c (__kernel_sinf): Likewise.
* sysdeps/ieee754/flt-32/k_tanf.c (__kernel_tanf): Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise.
* sysdeps/ieee754/flt-32/s_atanf.c (__atanf): Likewise.
* sysdeps/ieee754/flt-32/s_erff.c (__erff): Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c (__expm1f): Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c (__log1pf): Likewise.
* sysdeps/ieee754/flt-32/s_tanhf.c (__tanhf): Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-128/e_atanhl.c (__ieee754_atanhl):
Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c (__ieee754_expl): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c (__ieee754_j1l): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128/e_sinhl.c (__ieee754_sinhl): Likewise.
* sysdeps/ieee754/ldbl-128/k_sincosl.c (__kernel_sincosl):
Likewise.
* sysdeps/ieee754/ldbl-128/k_sinl.c (__kernel_sinl): Likewise.
* sysdeps/ieee754/ldbl-128/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128/s_asinhl.c (__asinhl): Likewise.
* sysdeps/ieee754/ldbl-128/s_atanl.c (__atanl): Likewise.
* sysdeps/ieee754/ldbl-128/s_erfl.c (__erfl): Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c (__fmal): Use math_force_eval
instead of volatile.
* sysdeps/ieee754/ldbl-128/s_log1pl.c (__log1pl): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-128/s_tanhl.c (__tanhl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c (__ieee754_asinl): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c (__ieee754_atanhl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Use math_check_force_underflow_nonneg.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c (__ieee754_sinhl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c (__kernel_sincosl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c (__kernel_sinl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__asinhl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c (__atanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c (__erfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Likewise.
* sysdeps/ieee754/ldbl-96/e_asinl.c (__ieee754_asinl): Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c (__ieee754_atanhl): Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/ldbl-96/e_j1l.c (__ieee754_j1l): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-96/e_sinhl.c (__ieee754_sinhl): Likewise.
* sysdeps/ieee754/ldbl-96/k_sinl.c (__kernel_sinl): Likewise.
* sysdeps/ieee754/ldbl-96/k_tanl.c (__kernel_tanl): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/ldbl-96/s_asinhl.c (__asinhl): Use
math_check_force_underflow.
* sysdeps/ieee754/ldbl-96/s_erfl.c (__erfl): Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c (__fmal): Use math_force_eval
instead of volatile.
* sysdeps/ieee754/ldbl-96/s_tanhl.c (__tanhl): Use
math_check_force_underflow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various i386 libm functions return values with excess range and
precision; Wilco Dijkstra's patches to make isfinite etc. expand
inline cause this pre-existing issue to result in test failures (when
e.g. a result that overflows float but not long double gets counted as
overflowing for some purposes but not others).
This patch addresses those cases arising from functions defined in C,
adding a math_narrow_eval macro that forces values to memory to
eliminate excess precision if FLT_EVAL_METHOD indicates this is
needed, and is a no-op otherwise. I'll convert existing uses of
volatile and asm for this purpose to use the new macro later, once
i386 has clean test results again (which requires fixes for .S files
as well).
Tested for x86_64 and x86. Committed.
[BZ #18980]
* sysdeps/generic/math_private.h: Include <float.h>.
(math_narrow_eval): New macro.
[FLT_EVAL_METHOD != 0] (excess_precision): Likewise.
* sysdeps/ieee754/dbl-64/e_cosh.c (__ieee754_cosh): Use
math_narrow_eval on overflowing return value.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r):
Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c (__ieee754_sinh): Likewise.
* sysdeps/ieee754/flt-32/e_coshf.c (__ieee754_coshf): Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r):
Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c (__ieee754_sinhf): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing implementations of lgamma functions (except for the ia64
versions) use the reflection formula for negative arguments. This
suffers large inaccuracy from cancellation near zeros of lgamma (near
where the gamma function is +/- 1).
This patch fixes this inaccuracy. For arguments above -2, there are
no zeros and no large cancellation, while for sufficiently large
negative arguments the zeros are so close to integers that even for
integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation
is not significant. Thus, it is only necessary to take special care
about cancellation for arguments around a limited number of zeros.
Accordingly, this patch uses precomputed tables of relevant zeros,
expressed as the sum of two floating-point values. The log of the
ratio of two sines can be computed accurately using log1p in cases
where log would lose accuracy. The log of the ratio of two gamma(1-x)
values can be computed using Stirling's approximation (the difference
between two values of that approximation to lgamma being computable
without computing the two values and then subtracting), with
appropriate adjustments (which don't reduce accuracy too much) in
cases where 1-x is too small to use Stirling's approximation directly.
In the interval from -3 to -2, using the ratios of sines and of
gamma(1-x) can still produce too much cancellation between those two
parts of the computation (and that interval is also the worst interval
for computing the ratio between gamma(1-x) values, which computation
becomes more accurate, while being less critical for the final result,
for larger 1-x). Because this can result in errors slightly above
those accepted in glibc, this interval is instead dealt with by
polynomial approximations. Separate polynomial approximations to
(|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8
from -3 to -2, where n (-3 or -2) is the nearest integer to the
1/8-interval and x0 is the zero of lgamma in the relevant half-integer
interval (-3 to -2.5 or -2.5 to -2).
Together, the two approaches are intended to give sufficient accuracy
for all negative arguments in the problem range. Outside that range,
the previous implementation continues to be used.
Tested for x86_64, x86, mips64 and powerpc. The mips64 and powerpc
testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm
with large negative arguments giving spurious "invalid" exceptions
(exposed by newly added tests for cases this patch doesn't affect the
logic for); I'll address those problems separately.
[BZ #2542]
[BZ #2543]
[BZ #2558]
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call
__lgamma_neg for arguments from -28.0 to -2.0.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call
__lgamma_negf for arguments from -15.0 to -2.0.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -33.0 to -2.0.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: New file.
* sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise.
* sysdeps/generic/math_private.h (__lgamma_negf): New prototype.
(__lgamma_neg): Likewise.
(__lgamma_negl): Likewise.
(__lgamma_product): Likewise.
(__lgamma_productl): Likewise.
* math/Makefile (libm-calls): Add lgamma_neg and lgamma_product.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Concluding the fixes for C90 libm functions calling C99 fe* functions,
this patch fixes the case of feupdateenv by making it a weak alias for
__feupdateenv and making the affected code call __feupdateenv.
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch). Also tested for ARM
(soft-float) that the math.h linknamespace tests now pass.
[BZ #17748]
* include/fenv.h (__feupdateenv): Use libm_hidden_proto.
* math/feupdateenv.c (__feupdateenv): Use libm_hidden_def.
* sysdeps/aarch64/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/arm/feupdateenv.c (feupdateenv): Rename to __feupdateenv
and define as weak alias of __feupdateenv. Use libm_hidden_weak.
* sysdeps/hppa/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/i386/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/powerpc/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/feupdateenv.c
(__feupdateenv): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/tile/math_private.h (__feupdateenv): New inline
function.
* sysdeps/x86_64/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/generic/math_private.h (default_libc_feupdateenv): Call
__feupdateenv instead of feupdateenv.
(default_libc_feupdateenv_test): Likewise.
(libc_feresetround_ctx): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the fixes for C90 libm functions calling C99 fe* functions,
this patch fixes the case of fesetround by making it a weak alias of
__fesetround and making the affected code call __fesetround. An
existing __fesetround function in fenv_libc.h for powerpc is renamed
to __fesetround_inline.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch). Also tested for ARM
(soft-float) that fesetround failures disappear from the linknamespace
test results (feupdateenv remains to be addressed to complete fixing
bug 17748).
[BZ #17748]
* include/fenv.h (__fesetround): Declare. Use libm_hidden_proto.
* math/fesetround.c (fesetround): Rename to __fesetround and
define as weak alias of __fesetround. Use libm_hidden_weak.
* sysdeps/aarch64/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/alpha/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/arm/fesetround.c (fesetround): Likewise.
* sysdeps/hppa/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/i386/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/ia64/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/m68k/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/mips/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround): Rename to
__fesetround_inline.
* sysdeps/powerpc/fpu/fenv_private.h (libc_fesetround_ppc): Call
__fesetround_inline instead of __fesetround.
* sysdeps/powerpc/fpu/fesetround.c (fesetround): Rename to
__fesetround and define as weak alias of __fesetround. Use
libm_hidden_weak. Call __fesetround_inline instead of
__fesetround.
* sysdeps/powerpc/nofpu/fesetround.c (fesetround): Rename to
__fesetround and define as weak alias of __fesetround. Use
libm_hidden_weak.
* sysdeps/powerpc/powerpc32/e500/nofpu/fesetround.c (fesetround):
Likewise.
* sysdeps/s390/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/sh/sh4/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/sparc/fpu/fesetround.c (fesetround): Likewise.
* sysdeps/tile/math_private.h (__fesetround): New inline function.
* sysdeps/x86_64/fpu/fesetround.c (fesetround): Rename to
__fesetround and define as weak alias of __fesetround. Use
libm_hidden_weak.
* sysdeps/generic/math_private.h (default_libc_fesetround): Call
__fesetround instead of fesetround.
(default_libc_feholdexcept_setround): Likewise.
(libc_feholdsetround_ctx): Likewise.
(libc_feholdsetround_noex_ctx): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the fixes for C90 libm functions calling C99 fe* functions,
this patch fixes the case of fesetenv by making it a weak alias of
__fesetenv and making the affected code (including various copies of
feupdateenv which also gets called from C90 functions) call
__fesetenv.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch). Also tested for ARM
(soft-float) that fesetenv failures disappear from the linknamespace
test results (fsetround and feupdateenv remain to be addressed to
complete fixing bug 17748).
[BZ #17748]
* include/fenv.h (__fesetenv): Use libm_hidden_proto.
* math/fesetenv.c (__fesetenv): Use libm_hidden_def.
* sysdeps/aarch64/fpu/fesetenv.c (fesetenv): Rename to __fesetenv
and define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/alpha/fpu/fesetenv.c (__fesetenv): Use libm_hidden_def.
* sysdeps/arm/fesetenv.c (fesetenv): Rename to __fesetenv and
define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/hppa/fpu/fesetenv.c (fesetenv): Likewise.
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Use libm_hidden_def.
* sysdeps/ia64/fpu/fesetenv.c (fesetenv): Rename to __fesetenv and
define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/m68k/fpu/fesetenv.c (__fesetenv): Use libm_hidden_def.
* sysdeps/mips/fpu/fesetenv.c (fesetenv): Rename to __fesetenv and
define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/fesetenv.c (__fesetenv): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fesetenv.c (__fesetenv):
Likewise.
* sysdeps/s390/fpu/fesetenv.c (fesetenv): Rename to __fesetenv and
define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/sh/sh4/fpu/fesetenv.c (fesetenv): Likewise.
* sysdeps/sparc/fpu/fesetenv.c (__fesetenv): Use libm_hidden_def.
* sysdeps/tile/math_private.h (__fesetenv): New inline function.
* sysdeps/x86_64/fpu/fesetenv.c (fesetenv): Rename to __fesetenv
and define as weak alias of __fesetenv. Use libm_hidden_weak.
* sysdeps/generic/math_private.h (default_libc_fesetenv): Use
__fesetenv instead of fesetenv.
(libc_feresetround_noex_ctx): Likewise.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/hppa/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/i386/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/powerpc/nofpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/feupdateenv.c
(__feupdateenv): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/x86_64/fpu/feupdateenv.c (__feupdateenv): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuing the fixes for C90 libm functions calling C99 fe* functions,
this patch fixes the case of feholdexcept by making it a weak alias of
__feholdexcept and making the affected code call __feholdexcept.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch). Also tested for ARM
(soft-float) that feholdexcept failures disappear from the
linknamespace test failures (fesetenv, fsetround and feupdateenv
remain to be addressed to complete fixing bug 17748).
[BZ #17748]
* include/fenv.h (__feholdexcept): Declare. Use
libm_hidden_proto.
* math/feholdexcpt.c (feholdexcept): Rename to __feholdexcept and
define as weak alias of __feholdexcept. Use libm_hidden_weak.
* sysdeps/aarch64/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/alpha/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/arm/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/hppa/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/i386/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/ia64/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/m68k/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/mips/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/powerpc/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/powerpc/nofpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/feholdexcpt.c
(feholdexcept): Likewise.
* sysdeps/s390/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/sh/sh4/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/sparc/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/x86_64/fpu/feholdexcpt.c (feholdexcept): Likewise.
* sysdeps/generic/math_private.h (default_libc_feholdexcept): Use
__feholdexcept instead of feholdexcept.
(default_libc_feholdexcept_setround): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some C90 libm functions call fegetenv via libc_feholdsetround*
functions in math_private.h. This patch makes them call __fegetenv
instead, making fegetenv into a weak alias for __fegetenv as needed.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch). Also tested for ARM
(soft-float) that fegetenv failures disappear from the linknamespace
test failures (however, similar fixes will also be needed for
fegetround, feholdexcept, fesetenv, fesetround and feupdateenv before
this set of namespace issues covered by bug 17748 is fully fixed and
those linknamespace tests start passing).
[BZ #17748]
* include/fenv.h (__fegetenv): Use libm_hidden_proto.
* math/fegetenv.c (__fegetenv): Use libm_hidden_def.
* sysdeps/aarch64/fpu/fegetenv.c (fegetenv): Rename to __fegetenv
and define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/alpha/fpu/fegetenv.c (__fegetenv): Use libm_hidden_def.
* sysdeps/arm/fegetenv.c (fegetenv): Rename to __fegetenv and
define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/hppa/fpu/fegetenv.c (fegetenv): Likewise.
* sysdeps/i386/fpu/fegetenv.c (__fegetenv): Use libm_hidden_def.
* sysdeps/ia64/fpu/fegetenv.c (fegetenv): Rename to __fegetenv and
define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/m68k/fpu/fegetenv.c (__fegetenv): Use libm_hidden_def.
* sysdeps/mips/fpu/fegetenv.c (fegetenv): Rename to __fegetenv and
define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/powerpc/fpu/fegetenv.c (__fegetenv): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/fegetenv.c (__fegetenv): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fegetenv.c (__fegetenv):
Likewise.
* sysdeps/s390/fpu/fegetenv.c (fegetenv): Rename to __fegetenv and
define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/sh/sh4/fpu/fegetenv.c (fegetenv): Likewise.
* sysdeps/sparc/fpu/fegetenv.c (__fegetenv): Use libm_hidden_def.
* sysdeps/tile/math_private.h (__fegetenv): New inline function.
* sysdeps/x86_64/fpu/fegetenv.c (fegetenv): Rename to __fegetenv
and define as weak alias of __fegetenv. Use libm_hidden_weak.
* sysdeps/generic/math_private.h (libc_feholdsetround_ctx): Use
__fegetenv instead of fegetenv.
(libc_feholdsetround_noex_ctx): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a generic implementation of HAVE_RM_CTX using standard
fenv calls. As a result math functions using SET_RESTORE_ROUND* macros
do not suffer from a large slowdown on targets which do not implement
optimized libc_fe*_ctx inline functions. Most of the libc_fe* inline
functions are now unused and could be removed in the future (there are
a few math functions left which use a mixture of standard fenv calls
and libc_fe* inline functions - they could be updated to use
SET_RESTORE_ROUND or improved to avoid expensive fenv manipulations
across just a few FP instructions).
libc_feholdsetround*_noex_ctx is added to enable better optimization of
SET_RESTORE_ROUND_NOEX* implementations.
Performance measurements on ARM and x86 of sin() show significant gains
over the current default, fairly close to a highly optimized fenv_private:
ARM x86
no fenv_private : 100% 100%
generic HAVE_RM_CTX : 250% 350%
fenv_private (CTX) : 250% 450%
2014-06-23 Will Newton <will.newton@linaro.org>
Wilco <wdijkstr@arm.com>
* sysdeps/generic/math_private.h: Add generic HAVE_RM_CTX
implementation. Include get-rounding-mode.h.
[!HAVE_RM_CTX]: Define HAVE_RM_CTX to zero.
[!libc_feholdsetround_noex_ctx]: Define
libc_feholdsetround_noex_ctx.
[!libc_feholdsetround_noexf_ctx]: Define
libc_feholdsetround_noexf_ctx.
[!libc_feholdsetround_noexl_ctx]: Define
libc_feholdsetround_noexl_ctx.
(libc_feholdsetround_ctx): New function.
(libc_feresetround_ctx): New function.
(libc_feholdsetround_noex_ctx): New function.
(libc_feresetround_noex_ctx): New function.
|
|
|
|
| |
This reverts commit 9290130a8de3f30970f741c79dfe8459f798e05f.
|
|
|
|
|
|
|
|
|
|
| |
ChangeLog:
2014-03-17 Will Newton <will.newton@linaro.org>
* sysdeps/generic/math_private.h: Check whether
HAVE_RM_CTX is defined with #ifdef rather
than #if.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
|
|
|
|
|
| |
We only need to set/restore rounding mode to ensure correct
computation for non-default rounding modes.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[BZ #6794]
Following Joseph comments about bug 6794, here is a proposed fix. It turned out
to be a large fix mainly because I had to move some file along to follow libm
files/names conventions.
Basically I have added wrappers (w_ilogb.c, w_ilogbf.c, w_ilogbl.c) that now calls
the symbol '__ieee754_ilogb'. The wrappers checks for '__ieee754_ilogb' output and
set the errno and raise exceptions as expected.
The '__ieee754_ilogb' is implemented in sysdeps. I have moved the 's_ilogb[f|l]' files
to e_ilogb[f|l] and renamed the '__ilogb[f|l]' to '__ieee754_ilogb[f|l]'.
I also found out a bug in i386 and x86-64 assembly coded ilogb implementation where
it raises a FE_DIVBYZERO when argument is '0.0'. I corrected this issue as well.
Finally I added the errno and FE_INVALID tests for 0.0, NaN and +-InF argument. Tested
on i386, x86-64, ppc32 and ppc64.
|
| |
|
|
|
|
|
| |
We can reduce the number of STMXCSR, and often we can avoid the
call to __feraiseexcept.
|
| |
|
| |
|
|
|
|
|
|
| |
For code generation to stay identical on x86_64, this requires that
we define the fp word manipulation macros before including the
generic header.
|
|
|
|
| |
so that double arithmetic in s_sin is done in 53 bit (without extend i386 double precision)
|
|
This reverts commit 60d6f5a6f50d838bcb4240fcc0223cac445c6c83.
|