| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
The powerpc sqrt implementation is also simplified:
- the static constants are open coded within the implementation.
- for !USE_SQRT_BUILTIN the function is implemented directly on
__ieee754_sqrt (it avoid an superflous extra jump).
Checked on powerpc-linux-gnu and powerpc64le-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
| |
Each symbol definitions are moved on a separated file and it
cover all symbol type definitions (float, double, long double,
and float128).
It allows to set support for architectures without the boiler
place of copying default values.
Checked with a build on the affected ABIs.
|
|
|
|
|
|
|
|
|
| |
This defines the macro such that it should behave best on all
supported powerpc targets. Likewise, this allows us to remove the
ppc64le specific s_fmaf128.c.
I have verified powerpc64le multiarch and powerpc64le power9
no-multiarch builds continue to generate optimize fmaf128.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The build uses an undefined macro evaluation for fmaf128 build.
For now set USE_FMAL_BUILTIN and USE_FMAF128_BUILTIN to 0.
Checked with a build for:
powerpc64le-linux-gnu-power9-disable-multi-arch
powerpc64le-linux-gnu-power9
powerpc64le-linux-gnu
powerpc64-linux-gnu-power8
powerpc64-linux-gnu
powerpc-linux-gnu-power4
powerpc-linux-gnu
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tested with build-many-glibcs for powerpc-linux-gnu
This is a non functional change and powerpc libm before/after was
byte invariant as compared below:
| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline
| for i in `find . -name libm-2.31.9000.so`; do
| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ;
| echo $?;
| done
| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so
| 0
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On platforms where long double may have two different formats, i.e.: the
same format as double (64-bits) or something else (128-bits), building
with -mlong-double-128 is the default and function calls in the user
program match the name of the function in Glibc. When building with
-mlong-double-64, Glibc installed headers redirect such calls to the
appropriate function.
Likewise, the internals of glibc are now built against IEEE long double.
However, the only (minimally) notable usage of long double is difftime.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
| |
There are 2 new input values that require to be marked as
xfail-rounding:ibm128-libgcc as they're known to fail because of libgcc
issues with different rounding modes.
Otherwise, the other tests just need an increase in ULP.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to string2.h (18b10de7ce) and string3.h (09a596cc2c) this
patch removes the fenvinline.h on all architectures. Currently
only powerpc implements some optimizations. This kind of optimization
is better implemented by the compiler (which handles the architecture
ISA transparently).
Also, for the specific optimized powerpc implementation the code is
becoming convoluted and these micro-optimization are hardly wildly
used, even more being a possible hotspot in realword cases
(non-default rounding are used only on specific cases and exception
handling are done most likely only on errors path). Only x86
implements similar optimization (on fenv.h) also indicates that
these should no be on libc.
The math/test-fenv already covers all math/test-fenvinline tests,
so it is safe to remove it.
The powerpc fegetround optimization is moved to internal
fenv_libc.h.
The BZ#94193 [1] the corresponding GCC bug for adding replacements
for these on powerpc.
Checked on x86_64-linux-gnu and powerpc64le-linux-gnu.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94193
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
Since at least POWER8, there is no performance advantage to entering
"Ignore Exceptions Mode", and doing so conditionally requires
- the conditional logic, and
- a system call.
Make it a no-op for uses within glibc.
|
|
|
|
|
|
|
| |
fesetenv_mode is used variously to write the FPSCR exception enable
bits and rounding mode bits. These are referred to as the control
bits in the POWER ISA. Change the name to be reflective of its
current and expected use, and match up well with fegetenv_control.
|
|
|
|
|
|
|
|
|
|
|
|
| |
libc_feholdsetround_noex_ppc_ctx currently performs:
1. Read FPSCR, save to context.
2. Create new FPSCR value: clear enables and set new rounding mode.
3. Write new value to FPSCR.
Since other bits just pass through, there is no need to write them.
Instead, write just the changed values (enables and rounding mode),
which can be a bit more efficient.
|
|
|
|
|
|
|
|
|
|
|
| |
fegetenv_status is used variously to retrieve the FPSCR exception enable
bits, rounding mode bits, or both. These are referred to as the control
bits in the POWER ISA. FPSCR status bits are also returned by the
'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
uses of fegetenv_status. Change the name to be reflective of its
current and expected use.
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
On POWER9, use more efficient means to update the 2-bit rounding mode
via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
or one 'mtfsfi' instruction that modifies 4 bits).
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
restore the rounding mode and exception enables, preserve exception flags,
and test whether given exception(s) were generated.
If the exception flags haven't changed, then it is sufficient and a bit
more efficient to just restore the rounding mode and enables, rather than
writing the full Floating-Point Status and Control Register (FPSCR).
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
fenv_private.h includes unused functions, magic macro constants, and
some replicated common code fragments.
Remove unused functions, replace magic constants with constants from
fenv_libc.h, and refactor replicated code.
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SET_RESTORE_ROUND brackets a block of code, temporarily setting and
restoring the rounding mode and letting everything else, including
exceptions generated within the block, pass through.
On powerpc, the current code clears the exception enables, which will hide
exceptions generated within the block. This issue was introduced by me
in commit e905212627350d54b58426214b5a54ddc852b0c9.
Fix this by not clearing exception enable bits in the prologue.
Also, since we are no longer changing the enable bits in either the
prologue or the epilogue, there is no need to test for entering/exiting
non-stop mode.
Also, optimize the prologue get/save/set rounding mode operations for
POWER9 and later by using 'mffscrn' when possible.
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Fixes: e905212627350d54b58426214b5a54ddc852b0c9
2019-09-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_and_set_rn): New.
(__fe_mffscrn): New.
* sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
Do not clear enable bits, remove obsolete code, use
fegetenv_and_set_rn.
(libc_feresetround_ppc): Remove obsolete code, use
fegetenv_and_set_rn.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.
Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset
of 'mffsl'.
So, just generate 'mffsl'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state. All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.
Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.
Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.
For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
a. If transitioning from a state where some exceptions enabled,
enter "ignore exceptions / non-stop" mode.
b. Write new value to FPSCR.
c. Put a mark on the wall indicating the FPSCR was changed.
(1) uses the 'mffs' instruction. On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.
For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
- Summary bits and exception indicators = current OR saved.
- Rounding mode and enables = saved.
- Status bits = current.
4. If transitioning from some exceptions enabled to none,
enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
enter "catch exceptions" mode.
6. Write new value to FPSCR.
The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them. This is certainly not expected.
So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR. Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.
We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR. The net is better performance.
In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR. This local variable can be used instead of again reading
the FPSCR.
Also, that value of the FPSCR which is read the second time is validated
against the requested enables. Since the write can't fail, this validation
step is unnecessary, and can be removed. Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set. For example, FE_INVALID_SQRT can be
requested, but cannot actually be set. This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.
Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.
|
|
|
|
|
|
|
|
|
|
|
| |
The exceptions passed to fe{en,dis}ableexcept() are defined in the ABI
as a bitmask, a combination of FE_INVALID, FE_OVERFLOW, etc.
Within the functions, these bits must be translated to/from the corresponding
enable bits in the Floating Point Status Control Register (FPSCR).
This translation is currently done bit-by-bit. The compiler generates
a series of conditional bit operations. Nicely, the "FE" exception
bits are all a uniform offset from the FPSCR enable bits, so the bit-by-bit
operation can instead be performed by a shift with appropriate masking.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using __builtin_cpu_supports() requires support in GCC and Glibc.
My recent patch to fenv_libc.h added an unprotected use of
__builtin_cpu_supports(). Compilation of Glibc itself will fail
with a sufficiently new GCC and sufficiently old Glibc:
../sysdeps/powerpc/fpu/fegetexcept.c: In function ‘__fegetexcept’:
../sysdeps/powerpc/fpu/fenv_libc.h:52:20: error: builtin ‘__builtin_cpu_supports’ needs GLIBC (2.23 and newer) that exports hardware capability bits [-Werror]
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Fixes 3db85a9814784a74536a1f0e7b7ddbfef7dc84bb.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The power7 logb implementation does not show a performance gain on
ISA 2.07+ chips with faster floating-point to GRP instructions
(currently POWER8 and POWER9).
This patch moves the POWER7 implementation to generic one and enables
it for POWER7. It also add some cleanup to use inline floating-point
number instead of define them using static const.
The performance difference is for POWER9:
- Without patch:
"logb": {
"subnormal": {
"duration": 4.99202e+09,
"iterations": 8.83662e+08,
"max": 75.194,
"min": 5.501,
"mean": 5.64925
},
"normal": {
"duration": 4.97063e+09,
"iterations": 9.97094e+08,
"max": 46.489,
"min": 4.956,
"mean": 4.98512
}
}
- With patch:
"logb": {
"subnormal": {
"duration": 4.97226e+09,
"iterations": 9.92036e+08,
"max": 77.209,
"min": 4.892,
"mean": 5.01218
},
"normal": {
"duration": 4.96192e+09,
"iterations": 1.07545e+09,
"max": 12.361,
"min": 4.593,
"mean": 4.61382
}
}
The ifunc implementation is also enabled only for powerpc64.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power7/fpu/s_logb.c: Move to ...
* sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ...
* sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ...
* sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_log* objects.
(CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c,
CFLAGS-s_logb-power7.c): New fule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The modf{f} optimization is not an optimization for ISA 2.07+. This
patch move the IFUNC for powerpc64 only, move the power5+ to generic
location, and include the generic implementation for ISA 2.07+.
The performance changes are based on modf benchtests:
* POWER9 - ppc64
"modf": {
"": {
"duration": 4.97057e+09,
"iterations": 1.00688e+09,
"max": 28.76,
"min": 4.912,
"mean": 4.9366
}
}
* POWER9 - power5+
"modf": {
"": {
"duration": 4.98291e+09,
"iterations": 9.32818e+08,
"max": 15.058,
"min": 5.107,
"mean": 5.34178
}
}
* POWER8 - ppc64
"modf": {
"": {
"duration": 5.05329e+09,
"iterations": 8.38814e+08,
"max": 518.051,
"min": 5.79,
"mean": 6.02433
}
}
* POWER8 - power5+
"modf": {
"": {
"duration": 5.05573e+09,
"iterations": 8.35254e+08,
"max": 63.141,
"min": 5.873,
"mean": 6.05293
}
}
* POWER7 - ppc64
"modf": {
"": {
"duration": 4.89818e+09,
"iterations": 1.08408e+09,
"max": 57.556,
"min": 3.953,
"mean": 4.51827
}
}
* POWER7 - power5+
"modf": {
"": {
"duration": 4.83789e+09,
"iterations": 1.33409e+09,
"max": 46.608,
"min": 2.224,
"mean": 3.62636
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ...
* sysdeps/powerpc/fpu/s_modf.c: ... here. Add ISA 2.07 optimization.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ...
* sysdeps/powerpc/fpu/s_modff.c: ... here. Add ISA 2.07 optimization.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c:
Adjust include.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls,
sysdep_routines): Add s_modf* objects.
(CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c,
CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc hypot is slight optimized by:
- Commit 8df4e219e43, both isnan and isinf are always inlined and thus
the check TEST_INF_NAN does not make sense anymore. The generic
check for POWER7 should be faster on all powerpc configuration.
- The redundant check 'y > two60factor && (x / y) > two60' is removed.
Both changes leads to unrequired ifunc especialization for power7 and
thus they are removed. Finally The code is also cleanup a bit by inlining
the constants floating points.
The performance changes using the hypot benchtests are:
- POWER9 without patch:
"hypot": {
"overflow": {
"duration": 4.98585e+09,
"iterations": 4.84932e+08,
"max": 46.551,
"min": 10.229,
"mean": 10.2815
},
"higher_two500": {
"duration": 5.00192e+09,
"iterations": 4.24843e+08,
"max": 33.319,
"min": 11.606,
"mean": 11.7736
},
"subnormal": {
"duration": 5.0075e+09,
"iterations": 4.06792e+08,
"max": 22.178,
"min": 12.15,
"mean": 12.3097
},
"less_two500": {
"duration": 5.00685e+09,
"iterations": 4.08772e+08,
"max": 22.784,
"min": 12.052,
"mean": 12.2485
},
"default": {
"duration": 5.06002e+09,
"iterations": 4.09894e+08,
"max": 20.648,
"min": 11.874,
"mean": 12.3447
}
}
- POWER9 with patch:
"hypot": {
"overflow": {
"duration": 4.91848e+09,
"iterations": 7.28039e+08,
"max": 47.958,
"min": 6.436,
"mean": 6.75579
},
"higher_two500": {
"duration": 4.9359e+09,
"iterations": 6.63376e+08,
"max": 20.783,
"min": 7.321,
"mean": 7.44057
},
"subnormal": {
"duration": 4.9479e+09,
"iterations": 6.19772e+08,
"max": 18.856,
"min": 7.817,
"mean": 7.98341
},
"less_two500": {
"duration": 4.94275e+09,
"iterations": 6.3889e+08,
"max": 17.452,
"min": 7.597,
"mean": 7.73647
},
"default": {
"duration": 5.03645e+09,
"iterations": 5.70718e+08,
"max": 18.904,
"min": 8.55,
"mean": 8.82476
}
}
- POWER7 without patch
"hypot": {
"overflow": {
"duration": 4.86637e+09,
"iterations": 6.43196e+08,
"max": 53.958,
"min": 7.328,
"mean": 7.56592
},
"higher_two500": {
"duration": 4.99842e+09,
"iterations": 3.11012e+08,
"max": 78.227,
"min": 15.696,
"mean": 16.0715
},
"subnormal": {
"duration": 4.99841e+09,
"iterations": 3.08935e+08,
"max": 51.392,
"min": 15.983,
"mean": 16.1795
},
"less_two500": {
"duration": 5.00108e+09,
"iterations": 2.99464e+08,
"max": 73.247,
"min": 16.416,
"mean": 16.7001
},
"default": {
"duration": 5.04645e+09,
"iterations": 3.52608e+08,
"max": 70.073,
"min": 13.38,
"mean": 14.3118
}
}
- POWER7 with patch
"hypot": {
"overflow": {
"duration": 4.80785e+09,
"iterations": 8.00001e+08,
"max": 66.262,
"min": 5.888,
"mean": 6.00981
},
"higher_two500": {
"duration": 4.9859e+09,
"iterations": 3.39449e+08,
"max": 5148.44,
"min": 14.539,
"mean": 14.6882
},
"subnormal": {
"duration": 4.9905e+09,
"iterations": 3.28874e+08,
"max": 64.905,
"min": 14.971,
"mean": 15.1745
},
"less_two500": {
"duration": 4.99494e+09,
"iterations": 3.19755e+08,
"max": 103.696,
"min": 14.972,
"mean": 15.6211
},
"default": {
"duration": 5.03951e+09,
"iterations": 4.02502e+08,
"max": 61.008,
"min": 12.368,
"mean": 12.5205
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022,
twoM500, twoM600, two60factor, pdnum): Remove.
(TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro.
(__ieee754_hypot): Replace static variables with inline definition,
remove ununsed branches.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove e_hypot-* objects.
(CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using 'mffs' instruction to read the Floating Point Status Control Register
(FPSCR) can force a processor flush in some cases, with undesirable
performance impact. If the values of the bits in the FPSCR which force the
flush are not needed, an instruction that is new to POWER9 (ISA version 3.0),
'mffsl' can be used instead.
Cases included: get_rounding_mode, fegetround, fegetmode, fegetexcept.
* sysdeps/powerpc/bits/fenvinline.h (__fegetround): Use
__fegetround_ISA300() or __fegetround_ISA2() as appropriate.
(__fegetround_ISA300) New.
(__fegetround_ISA2) New.
* sysdeps/powerpc/fpu_control.h (IS_ISA300): New.
(_FPU_MFFS): Move implementation...
(_FPU_GETCW): Here.
(_FPU_MFFSL): Move implementation....
(_FPU_GET_RC_ISA300): Here. New.
(_FPU_GET_RC): Use _FPU_GET_RC_ISA300() or _FPU_GETCW() as appropriate.
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300): New.
(fegetenv_status): New.
* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
instead of fegetenv_register().
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add 'volatile' keyword to a few asm statements, to force the compiler
to generate the instructions therein.
Some instances were implicitly volatile, but adding keyword for consistency.
2019-06-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (relax_fenv_state): Add 'volatile'.
* sysdeps/powerpc/fpu/fpu_control.h (__FPU_MFFS): Likewise.
(__FPU_MFFSL): Likewise.
(_FPU_SETCW): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_lrintf.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ...
* sysdeps/powerpc/fpu/s_lrintf.c: ... here.
* sysdeps/powerpc/powerpc32/fpu/Makefile
[$(subdir) == math] (CFLAGS-s_lrint.c): New rule.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4
optimization.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c,
CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c,
CFLAGS-s_lrint-ppc32.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c:
Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc isnan optimizations are not really a gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power5, power6, and power6x are just micro-optimization to
improve the Load-Hit-Store hazards from floating-point to general
register transfer, and current GCC already has support to minimize
it by inserting either extra nops or group dispatch instructions.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc-linux-gnu-power4 (which uses the hp-timing support on
benchtests):
- Generic sysdeps/ieee754 implementation:
"isnan": {
"": {
"duration": 4.98415e+09,
"iterations": 2.34516e+09,
"max": 45.925,
"min": 2.052,
"mean": 2.12529
},
"INF": {
"duration": 4.74057e+09,
"iterations": 1.69761e+09,
"max": 91.01,
"min": 2.052,
"mean": 2.79249
},
"NAN": {
"duration": 4.74071e+09,
"iterations": 1.68768e+09,
"max": 282.343,
"min": 2.052,
"mean": 2.809
}
}
- power7 optimized one:
$ ./testrun.sh benchtests/bench-isnan
"isnan": {
"": {
"duration": 4.96842e+09,
"iterations": 2.56297e+09,
"max": 50.048,
"min": 1.872,
"mean": 1.93854
},
"INF": {
"duration": 4.76648e+09,
"iterations": 1.54213e+09,
"max": 373.408,
"min": 2.661,
"mean": 3.09084
},
"NAN": {
"duration": 4.76845e+09,
"iterations": 1.54515e+09,
"max": 51.016,
"min": 2.736,
"mean": 3.08607
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The generic implementation requires getting the floating point
status, disable the invalid operation bit, and restore the
floating-point status. Each operation is costly and requires
flushing the FP pipeline.
Using the same scenarion for the previous analysis:
"isnan": {
"": {
"duration": 5.08284e+09,
"iterations": 6.2898e+08,
"max": 41.844,
"min": 8.057,
"mean": 8.08108
},
"INF": {
"duration": 4.97904e+09,
"iterations": 6.16176e+08,
"max": 39.661,
"min": 8.057,
"mean": 8.08055
},
"NAN": {
"duration": 4.98695e+09,
"iterations": 5.95866e+08,
"max": 29.728,
"min": 8.345,
"mean": 8.36925
}
}
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_isnan.c: Remove file.
* sysdeps/powerpc/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and
s_isnanf-* objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S:
Remove file
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls):
Remove s_isnan-* and s_isnanf-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC always expand copysign{f} for all possible cpus, so calling the libm
is only done if user explicitly states to disable the builtin (which is
done usually not for performance reason). So to provide ifunc variant
for copysign is just unrequired complexity, since libm will be called
on non-performance critical code.
This patch removes both powerpc32 and powerpc64 ifunc variants and
consolidates the powerpc implementation on
sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_copysign.c: New file.
* sysdeps/powerpc/fpu/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and
s_copysign-ppc32.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls):
Remove s_copysign-power6 s_copysign-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc rint{f} implementations on
the generic sysdeps/powerpc/fpu/s_rint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode,
round_to_integer_float, round_mode): Add RINT handling.
(reset_fenv_mode): New symbol.
* sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation.
* sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
Also, mask the result to avoid bleeding unrelated bits into the result of
_FPU_GET_RC().
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
fegetexcept() included code which exactly duplicates the code in
fenv_reg_to_exceptions(). Replace with a call to that function.
2019-06-05 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Replace code
with call to equivalent function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc nearbyint{f} implementations
on the generic sysdeps/powerpc/fpu/s_nearbyint{f}.
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
NEARBYINT handling.
* sysdeps/powerpc/fpu/s_nearbyint.c: New file.
* sysdeps/powerpc/fpu/s_nearbyintf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc trunc{f} implementations on
the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add
TRUNC handling.
(round_mode): Add definition for TRUNC.
* sysdeps/powerpc/fpu/s_trunc.c: New file.
* sysdeps/powerpc/fpu/s_truncf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
(CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc round{f} implementations on
the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
ROUND handling.
(round_mode): Add definition for ROUND.
(round_to_integer_float): Likewise.
* sysdeps/powerpc/fpu/s_round.c: New file.
* sysdeps/powerpc/fpu/s_roundf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
(CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc floor{f} implementations on
the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode):
Add FLOOR option.
(round_mode): Add definition for FLOOR.
* sysdeps/powerpc/fpu/s_floor.c: New file.
* sysdeps/powerpc/fpu/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S:
Likewise
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
(CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc ceil{f} implementations on
the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the frip
instruction) or a generic implementation which uses FP only operations.
It adds a generic implementation (round_to_integer.h) which is shared
with other rounding to integer routines. The resulting code should be
similar in term os performance to previous assembly one.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New
function.
* sysdeps/powerpc/fpu/round_to_integer.h: New file.
* sysdeps/powerpc/fpu/s_ceil.c: Likewise.
* sysdeps/powerpc/fpu/s_ceilf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ...
* here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64,
s_ceilf-power5+, and s_ceilf-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
| |
* sysdeps/powerpc/fpu/s_fma.c: Fix format.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch just refactor the assembly implementation to use compiler
builtins instead.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fma.S: Remove file.
* sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* sysdeps/powerpc/fpu/s_fma.c: New file.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since be2e25bbd78f9fdf the generic ieee754 implementation uses
compiler builtin which generates fabs{f} for all supported targets.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fabs.S: Remove file.
* sysdeps/powerpc/fpu/s_fabsf.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace inline asm uses of the "mffs" and "mtfsf" instructions with
the analogous GCC builtins.
__builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above.
Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use
these builtins without restriction.
2019-03-29 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline
asm with builtin.
* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE):
Likewise.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise.
(_GET_SI_FPSCR): Likewise.
(_SET_SI_FPSCR): Likewise.
|
|
|
|
|
|
|
|
|
|
|
| |
This file is not used anywhere since removal of {k,e}_rem_pio2f.c
(commit ca3aac57efa89).
Checked with a build for powerpc-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), powerpc64-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), and powerpc64le-linux (with --with-cpu=power8).
* sysdeps/powerpc/fpu/s_float_bitwise.h: Remove file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes various places where a space should have been present
before '(' in accordance with the GNU Coding Standards. Most but not
all of the fixes in this patch are for calls to sizeof (but it's not
exhaustive regarding such calls that should be fixed).
Tested for x86_64, and with build-many-glibcs.py.
* benchtests/bench-strcpy.c (do_test): Use space before '('.
* benchtests/bench-string.h (cmdline_process_function): Likewise.
* benchtests/bench-strlen.c (do_test): Likewise.
(test_main): Likewise.
* catgets/gencat.c (read_old): Likewise.
* elf/cache.c (load_aux_cache): Likewise.
* iconvdata/bug-iconv8.c (do_test): Likewise.
* math/test-tgmath-ret.c (do_test): Likewise.
* nis/nis_call.c (rec_dirsearch): Likewise.
* nis/nis_findserv.c (__nis_findfastest_with_timeout): Likewise.
* nptl/tst-audit-threads.c (do_test): Likewise.
* nptl/tst-cancel4-common.h (set_socket_buffer): Likewise.
* nss/nss_test1.c (init): Likewise.
* nss/test-netdb.c (test_hosts): Likewise.
* posix/execvpe.c (maybe_script_execute): Likewise.
* stdio-common/tst-fmemopen4.c (do_test): Likewise.
* stdio-common/tst-printf.c (do_test): Likewise.
* stdio-common/vfscanf-internal.c (__vfscanf_internal): Likewise.
* stdlib/fmtmsg.c (NKEYWORDS): Likewise.
* stdlib/qsort.c (STACK_SIZE): Likewise.
* stdlib/test-canon.c (do_test): Likewise.
* stdlib/tst-swapcontext1.c (do_test): Likewise.
* string/memcmp.c (OPSIZ): Likewise.
* string/test-strcpy.c (do_test): Likewise.
(do_random_tests): Likewise.
* string/test-strlen.c (do_test): Likewise.
(test_main): Likewise.
* string/test-strrchr.c (do_test): Likewise.
(do_random_tests): Likewise.
* string/tester.c (test_memrchr): Likewise.
(test_memchr): Likewise.
* sysdeps/generic/memcopy.h (OPSIZ): Likewise.
* sysdeps/generic/unwind-dw2.c (execute_stack_op): Likewise.
* sysdeps/generic/unwind-pe.h (read_sleb128): Likewise.
(read_encoded_value_with_base): Likewise.
* sysdeps/hppa/dl-machine.h (elf_machine_runtime_setup): Likewise.
* sysdeps/hppa/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/ia64/fpu/sfp-machine.h (TI_BITS): Likewise.
* sysdeps/mach/hurd/spawni.c (__spawni): Likewise.
* sysdeps/posix/spawni.c (maybe_script_execute): Likewise.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (query_auxv):
Likewise.
* sysdeps/unix/sysv/linux/aarch64/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/arm/bits/procfs.h (ELF_NGREG): Likewise.
* sysdeps/unix/sysv/linux/arm/ioperm.c (init_iosys): Likewise.
* sysdeps/unix/sysv/linux/csky/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/m68k/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/nios2/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/spawni.c (maybe_script_execute):
Likewise.
* sysdeps/unix/sysv/linux/x86/bits/procfs.h (ELF_NGREG): Likewise.
* sysdeps/unix/sysv/linux/x86/bits/sigcontext.h
(FP_XSTATE_MAGIC2_SIZE): Likewise.
* sysdeps/x86/fpu/sfp-machine.h (TI_BITS): Likewise.
* time/test_time.c (main): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The type used within e_sqrt.c(__slow_ieee754_sqrtf) was, unnecessarily and
likely inadvertently, double. float is not only appropriate, but also
more efficient, avoiding the need for the compiler to emit a
round-to-single-precision instruction.
This is the difference in compiled code:
0000000000000000 <__ieee754_sqrtf>:
0: 2c 08 20 ec fsqrts f1,f1
- 4: 18 08 20 fc frsp f1,f1
- 8: 20 00 80 4e blr
+ 4: 20 00 80 4e blr
(Found by Anton Blanchard.)
|
|
|
|
|
|
|
| |
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
|