| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.
Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset
of 'mffsl'.
So, just generate 'mffsl'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state. All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.
Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.
Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.
For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
a. If transitioning from a state where some exceptions enabled,
enter "ignore exceptions / non-stop" mode.
b. Write new value to FPSCR.
c. Put a mark on the wall indicating the FPSCR was changed.
(1) uses the 'mffs' instruction. On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.
For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
- Summary bits and exception indicators = current OR saved.
- Rounding mode and enables = saved.
- Status bits = current.
4. If transitioning from some exceptions enabled to none,
enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
enter "catch exceptions" mode.
6. Write new value to FPSCR.
The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them. This is certainly not expected.
So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR. Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.
We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR. The net is better performance.
In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR. This local variable can be used instead of again reading
the FPSCR.
Also, that value of the FPSCR which is read the second time is validated
against the requested enables. Since the write can't fail, this validation
step is unnecessary, and can be removed. Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set. For example, FE_INVALID_SQRT can be
requested, but cannot actually be set. This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.
Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.
|
|
|
|
|
|
|
|
|
|
|
| |
The exceptions passed to fe{en,dis}ableexcept() are defined in the ABI
as a bitmask, a combination of FE_INVALID, FE_OVERFLOW, etc.
Within the functions, these bits must be translated to/from the corresponding
enable bits in the Floating Point Status Control Register (FPSCR).
This translation is currently done bit-by-bit. The compiler generates
a series of conditional bit operations. Nicely, the "FE" exception
bits are all a uniform offset from the FPSCR enable bits, so the bit-by-bit
operation can instead be performed by a shift with appropriate masking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
C2X adds the interfaces from TS 18661-1, and all except a handful in
Annex F are unconditionally visible in C2X rather than only visible
when __STDC_WANT_IEC_60559_BFP_EXT__ is defined. This patch updates
glibc headers accordingly: most uses of __GLIBC_USE
(IEC_60559_BFP_EXT) are changed to a new __GLIBC_USE
(IEC_60559_BFP_EXT_C2X). (Regarding totalorder and totalordermag, the
type-generic macros in tgmath.h will go away when the functions are
changed to take pointer arguments.)
* bits/libc-header-start.h (__GLIBC_USE_IEC_60559_BFP_EXT): Update
comment.
(__GLIBC_USE_IEC_60559_BFP_EXT_C2X): New macro.
* bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Change to
[__GLIBC_USE (IEC_60559_BFP_EXT_C2X)].
* include/limits.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
* math/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
* math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
* stdlib/bits/stdlib-ldbl.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* stdlib/stdint.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
* stdlib/stdlib.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
* sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/csky/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/m68k/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/microblaze/bits/fenv.h [__GLIBC_USE
(IEC_60559_BFP_EXT)]: Likewise.
* sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/riscv/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise.
* math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
Likewise, except for totalorder, totalordermag, getpayload,
setpayload and setpayloadsig.
* math/tgmath.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise,
except for totalorder and totalordermag.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some implementations in sysdeps/powerpc/powerpc64/power8/*.S still had
pre power8 compatible binutils hardcoded macros and were not using
.machine power8.
This patch should not have semantic changes, in fact it should have the
same exact code generated.
Tested that generated stripped shared objects are identical when
using "strip --remove-section=.note.gnu.build-id".
Checked on:
- powerpc64le, power9, build-many-glibcs.py, gcc 6.4.1 20180104, binutils 2.26.2.20160726
- powerpc64le, power8, debian 9, gcc 6.3.0 20170516, binutils 2.28
- powerpc64le, power9, ubuntu 19.04, gcc 8.3.0, binutils 2.32
- powerpc64le, power9, opensuse tumbleweed, gcc 9.1.1 20190527, binutils 2.32
- powerpc64, power9, debian 10, gcc 8.3.0, binutils 2.31.1
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using __builtin_cpu_supports() requires support in GCC and Glibc.
My recent patch to fenv_libc.h added an unprotected use of
__builtin_cpu_supports(). Compilation of Glibc itself will fail
with a sufficiently new GCC and sufficiently old Glibc:
../sysdeps/powerpc/fpu/fegetexcept.c: In function ‘__fegetexcept’:
../sysdeps/powerpc/fpu/fenv_libc.h:52:20: error: builtin ‘__builtin_cpu_supports’ needs GLIBC (2.23 and newer) that exports hardware capability bits [-Werror]
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Fixes 3db85a9814784a74536a1f0e7b7ddbfef7dc84bb.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The power7 logb implementation does not show a performance gain on
ISA 2.07+ chips with faster floating-point to GRP instructions
(currently POWER8 and POWER9).
This patch moves the POWER7 implementation to generic one and enables
it for POWER7. It also add some cleanup to use inline floating-point
number instead of define them using static const.
The performance difference is for POWER9:
- Without patch:
"logb": {
"subnormal": {
"duration": 4.99202e+09,
"iterations": 8.83662e+08,
"max": 75.194,
"min": 5.501,
"mean": 5.64925
},
"normal": {
"duration": 4.97063e+09,
"iterations": 9.97094e+08,
"max": 46.489,
"min": 4.956,
"mean": 4.98512
}
}
- With patch:
"logb": {
"subnormal": {
"duration": 4.97226e+09,
"iterations": 9.92036e+08,
"max": 77.209,
"min": 4.892,
"mean": 5.01218
},
"normal": {
"duration": 4.96192e+09,
"iterations": 1.07545e+09,
"max": 12.361,
"min": 4.593,
"mean": 4.61382
}
}
The ifunc implementation is also enabled only for powerpc64.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power7/fpu/s_logb.c: Move to ...
* sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ...
* sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ...
* sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_log* objects.
(CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c,
CFLAGS-s_logb-power7.c): New fule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The modf{f} optimization is not an optimization for ISA 2.07+. This
patch move the IFUNC for powerpc64 only, move the power5+ to generic
location, and include the generic implementation for ISA 2.07+.
The performance changes are based on modf benchtests:
* POWER9 - ppc64
"modf": {
"": {
"duration": 4.97057e+09,
"iterations": 1.00688e+09,
"max": 28.76,
"min": 4.912,
"mean": 4.9366
}
}
* POWER9 - power5+
"modf": {
"": {
"duration": 4.98291e+09,
"iterations": 9.32818e+08,
"max": 15.058,
"min": 5.107,
"mean": 5.34178
}
}
* POWER8 - ppc64
"modf": {
"": {
"duration": 5.05329e+09,
"iterations": 8.38814e+08,
"max": 518.051,
"min": 5.79,
"mean": 6.02433
}
}
* POWER8 - power5+
"modf": {
"": {
"duration": 5.05573e+09,
"iterations": 8.35254e+08,
"max": 63.141,
"min": 5.873,
"mean": 6.05293
}
}
* POWER7 - ppc64
"modf": {
"": {
"duration": 4.89818e+09,
"iterations": 1.08408e+09,
"max": 57.556,
"min": 3.953,
"mean": 4.51827
}
}
* POWER7 - power5+
"modf": {
"": {
"duration": 4.83789e+09,
"iterations": 1.33409e+09,
"max": 46.608,
"min": 2.224,
"mean": 3.62636
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ...
* sysdeps/powerpc/fpu/s_modf.c: ... here. Add ISA 2.07 optimization.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ...
* sysdeps/powerpc/fpu/s_modff.c: ... here. Add ISA 2.07 optimization.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c:
Adjust include.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls,
sysdep_routines): Add s_modf* objects.
(CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c,
CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc hypot is slight optimized by:
- Commit 8df4e219e43, both isnan and isinf are always inlined and thus
the check TEST_INF_NAN does not make sense anymore. The generic
check for POWER7 should be faster on all powerpc configuration.
- The redundant check 'y > two60factor && (x / y) > two60' is removed.
Both changes leads to unrequired ifunc especialization for power7 and
thus they are removed. Finally The code is also cleanup a bit by inlining
the constants floating points.
The performance changes using the hypot benchtests are:
- POWER9 without patch:
"hypot": {
"overflow": {
"duration": 4.98585e+09,
"iterations": 4.84932e+08,
"max": 46.551,
"min": 10.229,
"mean": 10.2815
},
"higher_two500": {
"duration": 5.00192e+09,
"iterations": 4.24843e+08,
"max": 33.319,
"min": 11.606,
"mean": 11.7736
},
"subnormal": {
"duration": 5.0075e+09,
"iterations": 4.06792e+08,
"max": 22.178,
"min": 12.15,
"mean": 12.3097
},
"less_two500": {
"duration": 5.00685e+09,
"iterations": 4.08772e+08,
"max": 22.784,
"min": 12.052,
"mean": 12.2485
},
"default": {
"duration": 5.06002e+09,
"iterations": 4.09894e+08,
"max": 20.648,
"min": 11.874,
"mean": 12.3447
}
}
- POWER9 with patch:
"hypot": {
"overflow": {
"duration": 4.91848e+09,
"iterations": 7.28039e+08,
"max": 47.958,
"min": 6.436,
"mean": 6.75579
},
"higher_two500": {
"duration": 4.9359e+09,
"iterations": 6.63376e+08,
"max": 20.783,
"min": 7.321,
"mean": 7.44057
},
"subnormal": {
"duration": 4.9479e+09,
"iterations": 6.19772e+08,
"max": 18.856,
"min": 7.817,
"mean": 7.98341
},
"less_two500": {
"duration": 4.94275e+09,
"iterations": 6.3889e+08,
"max": 17.452,
"min": 7.597,
"mean": 7.73647
},
"default": {
"duration": 5.03645e+09,
"iterations": 5.70718e+08,
"max": 18.904,
"min": 8.55,
"mean": 8.82476
}
}
- POWER7 without patch
"hypot": {
"overflow": {
"duration": 4.86637e+09,
"iterations": 6.43196e+08,
"max": 53.958,
"min": 7.328,
"mean": 7.56592
},
"higher_two500": {
"duration": 4.99842e+09,
"iterations": 3.11012e+08,
"max": 78.227,
"min": 15.696,
"mean": 16.0715
},
"subnormal": {
"duration": 4.99841e+09,
"iterations": 3.08935e+08,
"max": 51.392,
"min": 15.983,
"mean": 16.1795
},
"less_two500": {
"duration": 5.00108e+09,
"iterations": 2.99464e+08,
"max": 73.247,
"min": 16.416,
"mean": 16.7001
},
"default": {
"duration": 5.04645e+09,
"iterations": 3.52608e+08,
"max": 70.073,
"min": 13.38,
"mean": 14.3118
}
}
- POWER7 with patch
"hypot": {
"overflow": {
"duration": 4.80785e+09,
"iterations": 8.00001e+08,
"max": 66.262,
"min": 5.888,
"mean": 6.00981
},
"higher_two500": {
"duration": 4.9859e+09,
"iterations": 3.39449e+08,
"max": 5148.44,
"min": 14.539,
"mean": 14.6882
},
"subnormal": {
"duration": 4.9905e+09,
"iterations": 3.28874e+08,
"max": 64.905,
"min": 14.971,
"mean": 15.1745
},
"less_two500": {
"duration": 4.99494e+09,
"iterations": 3.19755e+08,
"max": 103.696,
"min": 14.972,
"mean": 15.6211
},
"default": {
"duration": 5.03951e+09,
"iterations": 4.02502e+08,
"max": 61.008,
"min": 12.368,
"mean": 12.5205
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022,
twoM500, twoM600, two60factor, pdnum): Remove.
(TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro.
(__ieee754_hypot): Replace static variables with inline definition,
remove ununsed branches.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove e_hypot-* objects.
(CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using 'mffs' instruction to read the Floating Point Status Control Register
(FPSCR) can force a processor flush in some cases, with undesirable
performance impact. If the values of the bits in the FPSCR which force the
flush are not needed, an instruction that is new to POWER9 (ISA version 3.0),
'mffsl' can be used instead.
Cases included: get_rounding_mode, fegetround, fegetmode, fegetexcept.
* sysdeps/powerpc/bits/fenvinline.h (__fegetround): Use
__fegetround_ISA300() or __fegetround_ISA2() as appropriate.
(__fegetround_ISA300) New.
(__fegetround_ISA2) New.
* sysdeps/powerpc/fpu_control.h (IS_ISA300): New.
(_FPU_MFFS): Move implementation...
(_FPU_GETCW): Here.
(_FPU_MFFSL): Move implementation....
(_FPU_GET_RC_ISA300): Here. New.
(_FPU_GET_RC): Use _FPU_GET_RC_ISA300() or _FPU_GETCW() as appropriate.
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300): New.
(fegetenv_status): New.
* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
instead of fegetenv_register().
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generic implementation is faster on both power8 and power9:
POWER9:
- sysdeps/ieee754/flt-32/e_expf.c
"expf": {
"workload-spec2017.wrf": {
"duration": 5.1236e+09,
"iterations": 7.53344e+08,
"reciprocal-throughput": 5.9436,
"latency": 7.65869,
"max-throughput": 1.68248e+08,
"min-throughput": 1.30571e+08
}
}
- sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S
"expf": {
"workload-spec2017.wrf": {
"duration": 5.14429e+09,
"iterations": 5.29248e+08,
"reciprocal-throughput": 8.05372,
"latency": 11.3863,
"max-throughput": 1.24166e+08,
"min-throughput": 8.78249e+07
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc llround{f} implementations on
the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/fpu/Makefile
[$(subdir) == math] (CFLAGS-s_lround.c): New rule.
* sysdeps/powerpc/powerpc32/fpu/s_llround.c (__llround): Add power5+
and fctidz optimization.
* sysdeps/powerpc/powerpc32/fpu/s_lround.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_lround.c: New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_llround-power6.c, CFLAGS-s_llround-power5+.c,
CFLAGS-s_llround-ppc32.c, CFLAGS-s_lround-ppc32.c,
CFLAGS-s_lround-power5+.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llroundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add 'volatile' keyword to a few asm statements, to force the compiler
to generate the instructions therein.
Some instances were implicitly volatile, but adding keyword for consistency.
2019-06-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (relax_fenv_state): Add 'volatile'.
* sysdeps/powerpc/fpu/fpu_control.h (__FPU_MFFS): Likewise.
(__FPU_MFFSL): Likewise.
(_FPU_SETCW): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc {l}lround{f} implementations
on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c.
The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_llround-power8, s_llround-power6x,
s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64.
(CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c,
CFLAGS-s_llround-power5+.c): New rule.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c:
New file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/Makefile
[$(subdir) == math] (CFLAGS-s_llround.c): New rule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_llround-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_lrintf.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ...
* sysdeps/powerpc/fpu/s_lrintf.c: ... here.
* sysdeps/powerpc/powerpc32/fpu/Makefile
[$(subdir) == math] (CFLAGS-s_lrint.c): New rule.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4
optimization.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c,
CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c,
CFLAGS-s_lrint-ppc32.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c:
Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/fpu/s_llrint{f}.
The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and
s_llrint-ppc64.
(CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_llrint-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc finite optimization do not show much gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc64-linux-gnu-power7 (--with-cpu=power7):
- Generic sysdeps/ieee754 implementation:
"isfinite": {
"": {
"duration": 5.0082e+09,
"iterations": 2.45299e+09,
"max": 43.824,
"min": 2.008,
"mean": 2.04167
},
"INF": {
"duration": 4.66554e+09,
"iterations": 2.28288e+09,
"max": 35.73,
"min": 2.008,
"mean": 2.04371
},
"NAN": {
"duration": 4.66274e+09,
"iterations": 2.28716e+09,
"max": 34.161,
"min": 2.009,
"mean": 2.03866
}
}
- power7 optimized one:
"isfinite": {
"": {
"duration": 4.99111e+09,
"iterations": 2.65566e+09,
"max": 25.015,
"min": 1.716,
"mean": 1.87942
},
"INF": {
"duration": 4.6783e+09,
"iterations": 2.0999e+09,
"max": 35.264,
"min": 1.868,
"mean": 2.22787
},
"NAN": {
"duration": 4.67915e+09,
"iterations": 2.08678e+09,
"max": 38.099,
"min": 1.869,
"mean": 2.24228
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_finite*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_finite* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc isinf optimizations onyl adds complexity:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input pattern and branch
implementation for INF and denormal that does:
return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000)
Although it does show slight better latency than generic algorithm
(as below), it is only for power7 and requires it to override it
for power8.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_isinf* and s_isinf* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The powerpc isnan optimizations are not really a gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power5, power6, and power6x are just micro-optimization to
improve the Load-Hit-Store hazards from floating-point to general
register transfer, and current GCC already has support to minimize
it by inserting either extra nops or group dispatch instructions.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc-linux-gnu-power4 (which uses the hp-timing support on
benchtests):
- Generic sysdeps/ieee754 implementation:
"isnan": {
"": {
"duration": 4.98415e+09,
"iterations": 2.34516e+09,
"max": 45.925,
"min": 2.052,
"mean": 2.12529
},
"INF": {
"duration": 4.74057e+09,
"iterations": 1.69761e+09,
"max": 91.01,
"min": 2.052,
"mean": 2.79249
},
"NAN": {
"duration": 4.74071e+09,
"iterations": 1.68768e+09,
"max": 282.343,
"min": 2.052,
"mean": 2.809
}
}
- power7 optimized one:
$ ./testrun.sh benchtests/bench-isnan
"isnan": {
"": {
"duration": 4.96842e+09,
"iterations": 2.56297e+09,
"max": 50.048,
"min": 1.872,
"mean": 1.93854
},
"INF": {
"duration": 4.76648e+09,
"iterations": 1.54213e+09,
"max": 373.408,
"min": 2.661,
"mean": 3.09084
},
"NAN": {
"duration": 4.76845e+09,
"iterations": 1.54515e+09,
"max": 51.016,
"min": 2.736,
"mean": 3.08607
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The generic implementation requires getting the floating point
status, disable the invalid operation bit, and restore the
floating-point status. Each operation is costly and requires
flushing the FP pipeline.
Using the same scenarion for the previous analysis:
"isnan": {
"": {
"duration": 5.08284e+09,
"iterations": 6.2898e+08,
"max": 41.844,
"min": 8.057,
"mean": 8.08108
},
"INF": {
"duration": 4.97904e+09,
"iterations": 6.16176e+08,
"max": 39.661,
"min": 8.057,
"mean": 8.08055
},
"NAN": {
"duration": 4.98695e+09,
"iterations": 5.95866e+08,
"max": 29.728,
"min": 8.345,
"mean": 8.36925
}
}
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_isnan.c: Remove file.
* sysdeps/powerpc/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and
s_isnanf-* objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S:
Remove file
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls):
Remove s_isnan-* and s_isnanf-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC always expand copysign{f} for all possible cpus, so calling the libm
is only done if user explicitly states to disable the builtin (which is
done usually not for performance reason). So to provide ifunc variant
for copysign is just unrequired complexity, since libm will be called
on non-performance critical code.
This patch removes both powerpc32 and powerpc64 ifunc variants and
consolidates the powerpc implementation on
sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_copysign.c: New file.
* sysdeps/powerpc/fpu/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and
s_copysign-ppc32.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls):
Remove s_copysign-power6 s_copysign-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc rint{f} implementations on
the generic sysdeps/powerpc/fpu/s_rint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode,
round_to_integer_float, round_mode): Add RINT handling.
(reset_fenv_mode): New symbol.
* sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation.
* sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
Also, mask the result to avoid bleeding unrelated bits into the result of
_FPU_GET_RC().
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
fegetexcept() included code which exactly duplicates the code in
fenv_reg_to_exceptions(). Replace with a call to that function.
2019-06-05 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Replace code
with call to equivalent function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since GCC commit 271500 (svn), also known as the following commit on the
git mirror:
commit 61edec870f9fdfb5df3fa4e40f28cbaede28a5b1
Author: amodra <amodra@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Wed May 22 04:34:26 2019 +0000
[RS6000] Don't pass -many to the assembler
glibc builds are failing when an assembly implementation does not
declare the correct '.machine' directive, or when no such directive is
declared at all. For example, when a POWER6 instruction is used, but
'.machine power6' is not declared, the assembler will fail with an error
similar to the following:
../sysdeps/powerpc/powerpc64/power8/strcmp.S: Assembler messages:
24 ../sysdeps/powerpc/powerpc64/power8/strcmp.S:55: Error: unrecognized opcode: `cmpb'
This patch adds '.machine powerN' directives where none existed, as well
as it updates '.machine power7' directives on POWER8 files, because the
minimum binutils version required to build glibc (binutils 2.25) now
provides this machine version. It also adds '-many' to the assembler
command used to build tst-set_ppr.c.
Tested for powerpc, powerpc64, and powerpc64le, as well as with
build-many-glibcs.py for powerpc targets.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc nearbyint{f} implementations
on the generic sysdeps/powerpc/fpu/s_nearbyint{f}.
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
NEARBYINT handling.
* sysdeps/powerpc/fpu/s_nearbyint.c: New file.
* sysdeps/powerpc/fpu/s_nearbyintf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC 9 dropped support for the SPE extensions to PowerPC, which means
powerpc*-*-*gnuspe* configurations are no longer buildable with that
compiler. This ISA extension was peculiar to the “e500” line of
embedded PowerPC chips, which, as far as I can tell, are no longer
being manufactured, so I think we should follow suit.
This patch was developed by grepping for “e500”, “__SPE__”, and
“__NO_FPRS__”, and may not eliminate every vestige of SPE support.
Most uses of __NO_FPRS__ are left alone, as they are relevant to
normal embedded PowerPC with soft-float.
* sysdeps/powerpc/preconfigure: Error out on powerpc-*-*gnuspe*
host type.
* scripts/build-many-glibcs.py: Remove powerpc-*-linux-gnuspe
and powerpc-*-linux-gnuspe-e500v1 from list of build configurations.
* sysdeps/powerpc/powerpc32/e500: Recursively delete.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/e500: Recursively delete.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/context-e500.h:
Delete.
* sysdeps/powerpc/fpu_control.h: Remove SPE variant.
Issue an #error if used with a compiler in SPE-float mode.
* sysdeps/powerpc/powerpc32/__longjmp_common.S
* sysdeps/powerpc/powerpc32/setjmp_common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/getcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/setcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/swapcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S:
Remove code to preserve SPE register state.
* sysdeps/unix/sysv/linux/powerpc/elision-lock.c
* sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
* sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
Remove __SPE__ ifndefs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc trunc{f} implementations on
the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add
TRUNC handling.
(round_mode): Add definition for TRUNC.
* sysdeps/powerpc/fpu/s_trunc.c: New file.
* sysdeps/powerpc/fpu/s_truncf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
(CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc round{f} implementations on
the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
ROUND handling.
(round_mode): Add definition for ROUND.
(round_to_integer_float): Likewise.
* sysdeps/powerpc/fpu/s_round.c: New file.
* sysdeps/powerpc/fpu/s_roundf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
(CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc floor{f} implementations on
the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode):
Add FLOOR option.
(round_mode): Add definition for FLOOR.
* sysdeps/powerpc/fpu/s_floor.c: New file.
* sysdeps/powerpc/fpu/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S:
Likewise
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
(CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patches consolidates all the powerpc ceil{f} implementations on
the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the frip
instruction) or a generic implementation which uses FP only operations.
It adds a generic implementation (round_to_integer.h) which is shared
with other rounding to integer routines. The resulting code should be
similar in term os performance to previous assembly one.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New
function.
* sysdeps/powerpc/fpu/round_to_integer.h: New file.
* sysdeps/powerpc/fpu/s_ceil.c: Likewise.
* sysdeps/powerpc/fpu/s_ceilf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ...
* here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64,
s_ceilf-power5+, and s_ceilf-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the POWER4 optimized mpa optimization used currently
on all powerpc targets. In fact for newer chips, GCC generates *worse*
code than generic implementation as below. One possibilty would to
add ifunc variants for the mpa routines (as x86_64), but it will add
complexity only for older chips (and one would need to check if
power5, power5+, and power6 do benefict from this optimization),
and only for specific implementation (since most used one such
as sin, cos, exp, pow where optimized to avoid calling the slow
multiprecision path).
* POWER9 patched
$ ./testrun.sh benchtests/bench-atan
"atan": {
"": {
"duration": 5.12565e+09,
"iterations": 1.552e+08,
"max": 100.552,
"min": 7.799,
"mean": 33.0261
},
"144bits": {
"duration": 5.12745e+09,
"iterations": 825000,
"max": 7517.17,
"min": 6186.3,
"mean": 6215.09
}
}
$ ./testrun.sh benchtests/bench-acos
"acos": {
"": {
"duration": 5.21741e+09,
"iterations": 1.269e+08,
"max": 191.738,
"min": 7.931,
"mean": 41.1144
},
"slow": {
"duration": 5.25999e+09,
"iterations": 198000,
"max": 26681.7,
"min": 26463.6,
"mean": 26565.6
}
}
* POWER9 master
$ ./testrun.sh benchtests/bench-atan
"atan": {
"": {
"duration": 5.12815e+09,
"iterations": 1.552e+08,
"max": 134.788,
"min": 7.803,
"mean": 33.0422
},
"144bits": {
"duration": 5.1209e+09,
"iterations": 447000,
"max": 11615.8,
"min": 11301.8,
"mean": 11456.2
}
}
$ ./testrun.sh benchtests/bench-acos
"acos": {
"": {
"duration": 5.22272e+09,
"iterations": 1.269e+08,
"max": 115.981,
"min": 7.931,
"mean": 41.1562
},
"slow": {
"duration": 5.28723e+09,
"iterations": 96000,
"max": 55434.1,
"min": 54820.6,
"mean": 55075.3
}
}
* POWER8 patched
$ taskset -c 16 ./testrun.sh benchtests/bench-acos
"acos": {
"": {
"duration": 5.16398e+09,
"iterations": 9.99e+07,
"max": 174.408,
"min": 8.645,
"mean": 51.6915
},
"slow": {
"duration": 5.16982e+09,
"iterations": 96000,
"max": 54830.5,
"min": 53703.8,
"mean": 53852.3
}
}
* POWER8 master
$ taskset -c 16 ./testrun.sh benchtests/bench-acos
"acos": {
"": {
"duration": 5.17019e+09,
"iterations": 9.99e+07,
"max": 186.127,
"min": 8.633,
"mean": 51.7537
},
"slow": {
"duration": 5.34225e+09,
"iterations": 90000,
"max": 60353.2,
"min": 59155.3,
"mean": 59358.4
}
}
* POWER7 patched
$ taskset -c 16 benchtests/bench-asin
"asin": {
"": {
"duration": 5.15559e+09,
"iterations": 6.5e+07,
"max": 193.335,
"min": 12.227,
"mean": 79.3168
},
"slow": {
"duration": 5.20538e+09,
"iterations": 80000,
"max": 65705.2,
"min": 64299.4,
"mean": 65067.3
}
}
* POWER7 master
$ taskset -c 16 benchtests/bench-asin
"asin": {
"": {
"duration": 5.15446e+09,
"iterations": 6.5e+07,
"max": 184.575,
"min": 12.226,
"mean": 79.2994
},
"slow": {
"duration": 5.20616e+09,
"iterations": 80000,
"max": 65705.1,
"min": 64336.6,
"mean": 65076.9
}
}
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power4/fpu/Makefile: Remove file.
* sysdeps/powerpc/power4/fpu/mpa-arch.h: Likewise.
* sysdeps/powerpc/power4/fpu/mpa.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
|
|
|
|
|
| |
* sysdeps/powerpc/fpu/s_fma.c: Fix format.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch just refactor the assembly implementation to use compiler
builtins instead.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fma.S: Remove file.
* sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* sysdeps/powerpc/fpu/s_fma.c: New file.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since be2e25bbd78f9fdf the generic ieee754 implementation uses
compiler builtin which generates fabs{f} for all supported targets.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fabs.S: Remove file.
* sysdeps/powerpc/fpu/s_fabsf.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the power6 wcsrchr optimization and uses generic
implementation instead. Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.
Checked on powerpc64-linux-gnu.
* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcsrchr.c):
New rule.
* sysdeps/powerpc/power6/wcsrchr.c: Remove file.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power7.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power6.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power7.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/power6/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcsrchr-power6 and
wcsrchr-power7.
(CFLAGS-wcsrchr-power7.c, CFLAGS-wcsrchr-power6.c): Remove rule.
* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
Remove wcsrchr optimizations.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the power6 wcschr optimization and uses generic
implementation instead. Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.
Checked on powerpc64-linux-gnu.
* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcschr.c):
New rule.
* sysdeps/powerpc/power6/wcschr.c: Remove file.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power7.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr-power6.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr-power7.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc64/power6/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcschr-power6 and
wcschr-power7.
(CFLAGS-wcschr-power7.c, CFLAGS-wcschr-power6.c): Remove rule.
* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
Remove wcschr optimizations.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch removes the power6 wcscpy optimization and uses generic
implementation instead. Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.
Checked on powerpc64-linux-gnu.
* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcscpy.c):
New rule.
* sysdeps/powerpc/power6/wcscpy.c: Remove file.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and
wcscpy-power7.
(CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule.
* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
Remove wcscpy optimizations.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace inline asm uses of the "mffs" and "mtfsf" instructions with
the analogous GCC builtins.
__builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above.
Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use
these builtins without restriction.
2019-03-29 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline
asm with builtin.
* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE):
Likewise.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise.
(_GET_SI_FPSCR): Likewise.
(_SET_SI_FPSCR): Likewise.
|
|
|
|
|
|
|
|
|
|
|
| |
This file is not used anywhere since removal of {k,e}_rem_pio2f.c
(commit ca3aac57efa89).
Checked with a build for powerpc-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), powerpc64-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), and powerpc64le-linux (with --with-cpu=power8).
* sysdeps/powerpc/fpu/s_float_bitwise.h: Remove file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactor how hp-timing is used on loader code for statistics
report. The HP_TIMING_AVAIL and HP_SMALL_TIMING_AVAIL are removed and
HP_TIMING_INLINE is used instead to check for hp-timing avaliability.
For alpha, which only defines HP_SMALL_TIMING_AVAIL, the HP_TIMING_INLINE
is set iff for IS_IN(rtld).
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also
checked the builds for all afected ABIs.
* benchtests/bench-timing.h: Replace HP_TIMING_AVAIL with
HP_TIMING_INLINE.
* nptl/descr.h: Likewise.
* elf/rtld.c (RLTD_TIMING_DECLARE, RTLD_TIMING_NOW, RTLD_TIMING_DIFF,
RTLD_TIMING_ACCUM_NT, RTLD_TIMING_SET): Define.
(dl_start_final_info, _dl_start_final, dl_main, print_statistics):
Abstract hp-timing usage with RTLD_* macros.
* sysdeps/alpha/hp-timing.h (HP_TIMING_INLINE): Define iff IS_IN(rtld).
(HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Remove.
* sysdeps/generic/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL,
HP_TIMING_NONAVAIL): Likewise.
* sysdeps/ia64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/powerpc/powerpc32/power4/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/powerpc/powerpc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc32/sparcv9/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/x86/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/generic/hp-timing-common.h: Update comment with
HP_TIMING_AVAIL removal.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Starting with commit 1616d034b61622836d3a36af53dcfca7624c844e
the output was corrupted on some platforms as _dl_procinfo
was called for every auxv entry and on some architectures like s390
all entries were represented as "AT_HWCAP".
This patch is removing the condition and let _dl_procinfo decide if
an entry is printed in a platform specific or generic way.
This patch also adjusts all _dl_procinfo implementations which assumed
that they are only called for AT_HWCAP or AT_HWCAP2. They are now just
returning a non-zero-value for entries which are not handled platform
specifc.
ChangeLog:
* elf/dl-sysdep.c (_dl_show_auxv): Remove condition and always
call _dl_procinfo.
* sysdeps/unix/sysv/linux/s390/dl-procinfo.h (_dl_procinfo):
Ignore types other than AT_HWCAP.
* sysdeps/sparc/dl-procinfo.h (_dl_procinfo): Likewise.
* sysdeps/unix/sysv/linux/i386/dl-procinfo.h (_dl_procinfo):
Likewise.
* sysdeps/powerpc/dl-procinfo.h (_dl_procinfo): Adjust comment
in the case of falling back to generic output mechanism.
* sysdeps/unix/sysv/linux/arm/dl-procinfo.h (_dl_procinfo):
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes further coding style issues where code should have
broken lines before operators in accordance with the GNU Coding
Standards but instead was breaking lines after them.
Tested for x86_64, and with build-many-glibcs.py.
* stdio-common/vfscanf-internal.c (ARG): Break lines before rather
than after operators.
* sysdeps/mach/hurd/setitimer.c (timer_thread): Likewise.
(setitimer_locked): Likewise.
* sysdeps/mach/hurd/sigaction.c (__sigaction): Likewise.
* sysdeps/mach/hurd/sigaltstack.c (__sigaltstack): Likewise.
* sysdeps/mach/pagecopy.h (PAGE_COPY_FWD): Likewise.
* sysdeps/mach/thread_state.h (machine_get_basic_state): Likewise.
* sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c
(PPC_CPU_SUPPORTED): Likewise.
* sysdeps/unix/sysv/linux/alpha/a.out.h (N_TXTOFF): Likewise.
* sysdeps/unix/sysv/linux/generic/wordsize-32/overflow.h
(stat_overflow): Likewise.
(statfs_overflow): Likewise.
* sysdeps/unix/sysv/linux/tst-personality.c (do_test): Likewise.
* sysdeps/unix/sysv/linux/tst-ttyname.c (eq_ttyname): Likewise.
(eq_ttyname_r): Likewise.
(run_chroot_tests): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the commit
commit 81a14439417552324ec6ca71f65ddf8e7cdd51c7
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Feb 5 17:35:12 2019 -0200
wcsmbs: optimize wcscat
powerpc64 and powerpc64le builds fail when configured with
--disable-multi-arch and --with-cpu=power6 (or newer), due to an
undefined reference to __GI___wcscpy. This patch fixes this on
sysdeps/powerpc/powerpc64/power6/wcscpy.c, which is only used when
multi-arch is disabled.
This patch does nothing for the failures on 32-bits powerpc builds,
because the file is under the powerpc64 subdirectory, however, powerpc
builds were already failing with --disable-multi-arch, with multiple
error messages, even before the aforementioned commit.
Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
(all pass) and disabled (powerpc still fails as explained above).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes more places where a space should have been present
before '(' in accordance with the GNU Coding Standards (as with the
previous patch, mainly for calls to sizeof).
Tested with build-many-glibcs.py.
* sysdeps/powerpc/powerpc32/dl-machine.c
(__elf_machine_fixup_plt): Use space before '('.
(__process_machine_rela): Likewise.
* sysdeps/powerpc/powerpc32/register-dump.h (register_dump):
Likewise.
* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (TI_BITS):
Likewise.
* sysdeps/powerpc/powerpc64/register-dump.h (register_dump):
Likewise.
* sysdeps/powerpc/test-arith.c (union_t): Likewise.
(pattern): Likewise.
(delta): Likewise.
(check_result): Likewise.
(check_excepts): Likewise.
(check_op): Likewise.
(fail_xr): Likewise.
* sysdeps/unix/alpha/sysdep.h (syscall_promote): Likewise.
* sysdeps/unix/sysv/linux/alpha/a.out.h (AOUTHSZ): Likewise.
(SCNHSZ): Likewise.
* sysdeps/unix/sysv/linux/hppa/makecontext.c (FRAME_SIZE_BYTES):
Likewise.
(ARGS): Likewise.
(__makecontext): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (ucontext_t):
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes the linknamespace issues add on wcscpy refactor
for powerpc-linux-gnu-power4 as shown by the tests:
FAIL: conform/POSIX/fnmatch.h/linknamespace
FAIL: conform/POSIX/glob.h/linknamespace
FAIL: conform/POSIX/wordexp.h/linknamespace
FAIL: conform/XPG4/fnmatch.h/linknamespace
FAIL: conform/XPG4/glob.h/linknamespace
FAIL: conform/XPG4/wordexp.h/linknamespace
FAIL: conform/XPG42/fnmatch.h/linknamespace
FAIL: conform/XPG42/glob.h/linknamespace
FAIL: conform/XPG42/wordexp.h/linknamespace
[initial] wordexp -> [libc.a(wordexp.o)] fnmatch -> [libc.a(fnmatch.o)] __wcscat -> [libc.a(wcscat.o)] __wcscpy -> [libc.a(wcscpy.o)] wcscpy
Checked on powerpc-linux-gnu-power4.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Define ifunc
symbol as __wcspcy instead of wcscpy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes various places where a space should have been present
before '(' in accordance with the GNU Coding Standards. Most but not
all of the fixes in this patch are for calls to sizeof (but it's not
exhaustive regarding such calls that should be fixed).
Tested for x86_64, and with build-many-glibcs.py.
* benchtests/bench-strcpy.c (do_test): Use space before '('.
* benchtests/bench-string.h (cmdline_process_function): Likewise.
* benchtests/bench-strlen.c (do_test): Likewise.
(test_main): Likewise.
* catgets/gencat.c (read_old): Likewise.
* elf/cache.c (load_aux_cache): Likewise.
* iconvdata/bug-iconv8.c (do_test): Likewise.
* math/test-tgmath-ret.c (do_test): Likewise.
* nis/nis_call.c (rec_dirsearch): Likewise.
* nis/nis_findserv.c (__nis_findfastest_with_timeout): Likewise.
* nptl/tst-audit-threads.c (do_test): Likewise.
* nptl/tst-cancel4-common.h (set_socket_buffer): Likewise.
* nss/nss_test1.c (init): Likewise.
* nss/test-netdb.c (test_hosts): Likewise.
* posix/execvpe.c (maybe_script_execute): Likewise.
* stdio-common/tst-fmemopen4.c (do_test): Likewise.
* stdio-common/tst-printf.c (do_test): Likewise.
* stdio-common/vfscanf-internal.c (__vfscanf_internal): Likewise.
* stdlib/fmtmsg.c (NKEYWORDS): Likewise.
* stdlib/qsort.c (STACK_SIZE): Likewise.
* stdlib/test-canon.c (do_test): Likewise.
* stdlib/tst-swapcontext1.c (do_test): Likewise.
* string/memcmp.c (OPSIZ): Likewise.
* string/test-strcpy.c (do_test): Likewise.
(do_random_tests): Likewise.
* string/test-strlen.c (do_test): Likewise.
(test_main): Likewise.
* string/test-strrchr.c (do_test): Likewise.
(do_random_tests): Likewise.
* string/tester.c (test_memrchr): Likewise.
(test_memchr): Likewise.
* sysdeps/generic/memcopy.h (OPSIZ): Likewise.
* sysdeps/generic/unwind-dw2.c (execute_stack_op): Likewise.
* sysdeps/generic/unwind-pe.h (read_sleb128): Likewise.
(read_encoded_value_with_base): Likewise.
* sysdeps/hppa/dl-machine.h (elf_machine_runtime_setup): Likewise.
* sysdeps/hppa/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/ia64/fpu/sfp-machine.h (TI_BITS): Likewise.
* sysdeps/mach/hurd/spawni.c (__spawni): Likewise.
* sysdeps/posix/spawni.c (maybe_script_execute): Likewise.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (query_auxv):
Likewise.
* sysdeps/unix/sysv/linux/aarch64/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/arm/bits/procfs.h (ELF_NGREG): Likewise.
* sysdeps/unix/sysv/linux/arm/ioperm.c (init_iosys): Likewise.
* sysdeps/unix/sysv/linux/csky/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/m68k/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/nios2/bits/procfs.h (ELF_NGREG):
Likewise.
* sysdeps/unix/sysv/linux/spawni.c (maybe_script_execute):
Likewise.
* sysdeps/unix/sysv/linux/x86/bits/procfs.h (ELF_NGREG): Likewise.
* sysdeps/unix/sysv/linux/x86/bits/sigcontext.h
(FP_XSTATE_MAGIC2_SIZE): Likewise.
* sysdeps/x86/fpu/sfp-machine.h (TI_BITS): Likewise.
* time/test_time.c (main): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch rewrites wcscat using wcslen and wcscpy. This is similar to
the optimization done on strcat by 6e46de42fe.
The strcpy changes are mainly to add the internal alias to avoid PLT
calls.
Checked on x86_64-linux-gnu and a build against the affected
architectures.
* include/wchar.h (__wcscpy): New prototype.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
(__wcscpy): Route internal symbol to generic implementation.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy):
Add internal __wcscpy alias.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise.
* sysdeps/s390/wcscpy.c (wcscpy): Likewise.
* sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise.
* wcsmbs/wcscpy.c (wcscpy): Add
* sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to
use generic implementation.
* wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes -Wimplicit-fallthrough warnings in system-specific
code that show up building glibc with -Wextra, by adding fall-through
comments, or moving existing such comments to the place required for
them to work (immediately before the case label being fallen through).
Tested with build-many-glibcs.py.
* sysdeps/i386/dl-machine.h (elf_machine_rela): Add fall-through
comments.
* sysdeps/m68k/m680x0/fpu/s_cexp_template.c (s(__cexp)): Likewise.
* sysdeps/m68k/memcopy.h (WORD_COPY_FWD): Likewise.
(WORD_COPY_BWD): Likewise.
* sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/s390/iso-8859-1_cp037_z900.c (TR_LOOP): Likewise.
* sysdeps/mips/dl-machine.h (elf_machine_reloc): Move fall-through
comment.
* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
|