about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64
Commit message (Collapse)AuthorAgeFilesLines
* powerpc: Cleanup: use actual power8 assembly mnemonicsRaoni Fassina Firmino2019-08-0111-187/+87
| | | | | | | | | | | | | | | | | | | | | | Some implementations in sysdeps/powerpc/powerpc64/power8/*.S still had pre power8 compatible binutils hardcoded macros and were not using .machine power8. This patch should not have semantic changes, in fact it should have the same exact code generated. Tested that generated stripped shared objects are identical when using "strip --remove-section=.note.gnu.build-id". Checked on: - powerpc64le, power9, build-many-glibcs.py, gcc 6.4.1 20180104, binutils 2.26.2.20160726 - powerpc64le, power8, debian 9, gcc 6.3.0 20170516, binutils 2.28 - powerpc64le, power9, ubuntu 19.04, gcc 8.3.0, binutils 2.32 - powerpc64le, power9, opensuse tumbleweed, gcc 9.1.1 20190527, binutils 2.32 - powerpc64, power9, debian 10, gcc 8.3.0, binutils 2.31.1 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: refactor logb{f,l}Adhemerval Zanella2019-07-0814-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The power7 logb implementation does not show a performance gain on ISA 2.07+ chips with faster floating-point to GRP instructions (currently POWER8 and POWER9). This patch moves the POWER7 implementation to generic one and enables it for POWER7. It also add some cleanup to use inline floating-point number instead of define them using static const. The performance difference is for POWER9: - Without patch: "logb": { "subnormal": { "duration": 4.99202e+09, "iterations": 8.83662e+08, "max": 75.194, "min": 5.501, "mean": 5.64925 }, "normal": { "duration": 4.97063e+09, "iterations": 9.97094e+08, "max": 46.489, "min": 4.956, "mean": 4.98512 } } - With patch: "logb": { "subnormal": { "duration": 4.97226e+09, "iterations": 9.92036e+08, "max": 77.209, "min": 4.892, "mean": 5.01218 }, "normal": { "duration": 4.96192e+09, "iterations": 1.07545e+09, "max": 12.361, "min": 4.593, "mean": 4.61382 } } The ifunc implementation is also enabled only for powerpc64. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power7/fpu/s_logb.c: Move to ... * sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ... * sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ... * sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_log* objects. (CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c, CFLAGS-s_logb-power7.c): New fule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Refactor modf{f}Adhemerval Zanella2019-07-088-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The modf{f} optimization is not an optimization for ISA 2.07+. This patch move the IFUNC for powerpc64 only, move the power5+ to generic location, and include the generic implementation for ISA 2.07+. The performance changes are based on modf benchtests: * POWER9 - ppc64 "modf": { "": { "duration": 4.97057e+09, "iterations": 1.00688e+09, "max": 28.76, "min": 4.912, "mean": 4.9366 } } * POWER9 - power5+ "modf": { "": { "duration": 4.98291e+09, "iterations": 9.32818e+08, "max": 15.058, "min": 5.107, "mean": 5.34178 } } * POWER8 - ppc64 "modf": { "": { "duration": 5.05329e+09, "iterations": 8.38814e+08, "max": 518.051, "min": 5.79, "mean": 6.02433 } } * POWER8 - power5+ "modf": { "": { "duration": 5.05573e+09, "iterations": 8.35254e+08, "max": 63.141, "min": 5.873, "mean": 6.05293 } } * POWER7 - ppc64 "modf": { "": { "duration": 4.89818e+09, "iterations": 1.08408e+09, "max": 57.556, "min": 3.953, "mean": 4.51827 } } * POWER7 - power5+ "modf": { "": { "duration": 4.83789e+09, "iterations": 1.33409e+09, "max": 46.608, "min": 2.224, "mean": 3.62636 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ... * sysdeps/powerpc/fpu/s_modf.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ... * sysdeps/powerpc/fpu/s_modff.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c: Adjust include. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls, sysdep_routines): Add s_modf* objects. (CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c, CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: hypot refactor and optimizationAdhemerval Zanella2019-07-087-160/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc hypot is slight optimized by: - Commit 8df4e219e43, both isnan and isinf are always inlined and thus the check TEST_INF_NAN does not make sense anymore. The generic check for POWER7 should be faster on all powerpc configuration. - The redundant check 'y > two60factor && (x / y) > two60' is removed. Both changes leads to unrequired ifunc especialization for power7 and thus they are removed. Finally The code is also cleanup a bit by inlining the constants floating points. The performance changes using the hypot benchtests are: - POWER9 without patch: "hypot": { "overflow": { "duration": 4.98585e+09, "iterations": 4.84932e+08, "max": 46.551, "min": 10.229, "mean": 10.2815 }, "higher_two500": { "duration": 5.00192e+09, "iterations": 4.24843e+08, "max": 33.319, "min": 11.606, "mean": 11.7736 }, "subnormal": { "duration": 5.0075e+09, "iterations": 4.06792e+08, "max": 22.178, "min": 12.15, "mean": 12.3097 }, "less_two500": { "duration": 5.00685e+09, "iterations": 4.08772e+08, "max": 22.784, "min": 12.052, "mean": 12.2485 }, "default": { "duration": 5.06002e+09, "iterations": 4.09894e+08, "max": 20.648, "min": 11.874, "mean": 12.3447 } } - POWER9 with patch: "hypot": { "overflow": { "duration": 4.91848e+09, "iterations": 7.28039e+08, "max": 47.958, "min": 6.436, "mean": 6.75579 }, "higher_two500": { "duration": 4.9359e+09, "iterations": 6.63376e+08, "max": 20.783, "min": 7.321, "mean": 7.44057 }, "subnormal": { "duration": 4.9479e+09, "iterations": 6.19772e+08, "max": 18.856, "min": 7.817, "mean": 7.98341 }, "less_two500": { "duration": 4.94275e+09, "iterations": 6.3889e+08, "max": 17.452, "min": 7.597, "mean": 7.73647 }, "default": { "duration": 5.03645e+09, "iterations": 5.70718e+08, "max": 18.904, "min": 8.55, "mean": 8.82476 } } - POWER7 without patch "hypot": { "overflow": { "duration": 4.86637e+09, "iterations": 6.43196e+08, "max": 53.958, "min": 7.328, "mean": 7.56592 }, "higher_two500": { "duration": 4.99842e+09, "iterations": 3.11012e+08, "max": 78.227, "min": 15.696, "mean": 16.0715 }, "subnormal": { "duration": 4.99841e+09, "iterations": 3.08935e+08, "max": 51.392, "min": 15.983, "mean": 16.1795 }, "less_two500": { "duration": 5.00108e+09, "iterations": 2.99464e+08, "max": 73.247, "min": 16.416, "mean": 16.7001 }, "default": { "duration": 5.04645e+09, "iterations": 3.52608e+08, "max": 70.073, "min": 13.38, "mean": 14.3118 } } - POWER7 with patch "hypot": { "overflow": { "duration": 4.80785e+09, "iterations": 8.00001e+08, "max": 66.262, "min": 5.888, "mean": 6.00981 }, "higher_two500": { "duration": 4.9859e+09, "iterations": 3.39449e+08, "max": 5148.44, "min": 14.539, "mean": 14.6882 }, "subnormal": { "duration": 4.9905e+09, "iterations": 3.28874e+08, "max": 64.905, "min": 14.971, "mean": 15.1745 }, "less_two500": { "duration": 4.99494e+09, "iterations": 3.19755e+08, "max": 103.696, "min": 14.972, "mean": 15.6211 }, "default": { "duration": 5.03951e+09, "iterations": 4.02502e+08, "max": 61.008, "min": 12.368, "mean": 12.5205 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022, twoM500, twoM600, two60factor, pdnum): Remove. (TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro. (__ieee754_hypot): Replace static variables with inline definition, remove ununsed branches. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_hypot-* objects. (CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Use generic e_expfAdhemerval Zanella2019-06-267-383/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic implementation is faster on both power8 and power9: POWER9: - sysdeps/ieee754/flt-32/e_expf.c "expf": { "workload-spec2017.wrf": { "duration": 5.1236e+09, "iterations": 7.53344e+08, "reciprocal-throughput": 5.9436, "latency": 7.65869, "max-throughput": 1.68248e+08, "min-throughput": 1.30571e+08 } } - sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S "expf": { "workload-spec2017.wrf": { "duration": 5.14429e+09, "iterations": 5.29248e+08, "reciprocal-throughput": 8.05372, "latency": 11.3863, "max-throughput": 1.24166e+08, "min-throughput": 8.78249e+07 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: Refactor powerpc64 lround/lroundf/llround/llroundfAdhemerval Zanella2019-06-1730-490/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc {l}lround{f} implementations on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llround-power8, s_llround-power6x, s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64. (CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c, CFLAGS-s_llround-power5+.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/Makefile [$(subdir) == math] (CFLAGS-s_llround.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llround-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintfAdhemerval Zanella2019-06-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_lrintf.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ... * sysdeps/powerpc/fpu/s_lrintf.c: ... here. * sysdeps/powerpc/powerpc32/fpu/Makefile [$(subdir) == math] (CFLAGS-s_lrint.c): New rule. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4 optimization. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c, CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c, CFLAGS-s_lrint-ppc32.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintfAdhemerval Zanella2019-06-1721-225/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/fpu/s_llrint{f}. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and s_llrint-ppc64. (CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llrint-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized finiteAdhemerval Zanella2019-06-1211-367/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc finite optimization do not show much gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc64-linux-gnu-power7 (--with-cpu=power7): - Generic sysdeps/ieee754 implementation: "isfinite": { "": { "duration": 5.0082e+09, "iterations": 2.45299e+09, "max": 43.824, "min": 2.008, "mean": 2.04167 }, "INF": { "duration": 4.66554e+09, "iterations": 2.28288e+09, "max": 35.73, "min": 2.008, "mean": 2.04371 }, "NAN": { "duration": 4.66274e+09, "iterations": 2.28716e+09, "max": 34.161, "min": 2.009, "mean": 2.03866 } } - power7 optimized one: "isfinite": { "": { "duration": 4.99111e+09, "iterations": 2.65566e+09, "max": 25.015, "min": 1.716, "mean": 1.87942 }, "INF": { "duration": 4.6783e+09, "iterations": 2.0999e+09, "max": 35.264, "min": 1.868, "mean": 2.22787 }, "NAN": { "duration": 4.67915e+09, "iterations": 2.08678e+09, "max": 38.099, "min": 1.869, "mean": 2.24228 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_finite* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_finite* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized isinfAdhemerval Zanella2019-06-1211-362/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc isinf optimizations onyl adds complexity: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input pattern and branch implementation for INF and denormal that does: return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000) Although it does show slight better latency than generic algorithm (as below), it is only for power7 and requires it to override it for power8. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized isnanAdhemerval Zanella2019-06-1217-672/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc isnan optimizations are not really a gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power5, power6, and power6x are just micro-optimization to improve the Load-Hit-Store hazards from floating-point to general register transfer, and current GCC already has support to minimize it by inserting either extra nops or group dispatch instructions. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc-linux-gnu-power4 (which uses the hp-timing support on benchtests): - Generic sysdeps/ieee754 implementation: "isnan": { "": { "duration": 4.98415e+09, "iterations": 2.34516e+09, "max": 45.925, "min": 2.052, "mean": 2.12529 }, "INF": { "duration": 4.74057e+09, "iterations": 1.69761e+09, "max": 91.01, "min": 2.052, "mean": 2.79249 }, "NAN": { "duration": 4.74071e+09, "iterations": 1.68768e+09, "max": 282.343, "min": 2.052, "mean": 2.809 } } - power7 optimized one: $ ./testrun.sh benchtests/bench-isnan "isnan": { "": { "duration": 4.96842e+09, "iterations": 2.56297e+09, "max": 50.048, "min": 1.872, "mean": 1.93854 }, "INF": { "duration": 4.76648e+09, "iterations": 1.54213e+09, "max": 373.408, "min": 2.661, "mean": 3.09084 }, "NAN": { "duration": 4.76845e+09, "iterations": 1.54515e+09, "max": 51.016, "min": 2.736, "mean": 3.08607 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The generic implementation requires getting the floating point status, disable the invalid operation bit, and restore the floating-point status. Each operation is costly and requires flushing the FP pipeline. Using the same scenarion for the previous analysis: "isnan": { "": { "duration": 5.08284e+09, "iterations": 6.2898e+08, "max": 41.844, "min": 8.057, "mean": 8.08108 }, "INF": { "duration": 4.97904e+09, "iterations": 6.16176e+08, "max": 39.661, "min": 8.057, "mean": 8.08055 }, "NAN": { "duration": 4.98695e+09, "iterations": 5.95866e+08, "max": 29.728, "min": 8.345, "mean": 8.36925 } } - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_isnan.c: Remove file. * sysdeps/powerpc/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S: Remove file * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: copysign cleanupAdhemerval Zanella2019-06-129-251/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC always expand copysign{f} for all possible cpus, so calling the libm is only done if user explicitly states to disable the builtin (which is done usually not for performance reason). So to provide ifunc variant for copysign is just unrequired complexity, since libm will be called on non-performance critical code. This patch removes both powerpc32 and powerpc64 ifunc variants and consolidates the powerpc implementation on sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_copysign.c: New file. * sysdeps/powerpc/fpu/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and s_copysign-ppc32. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls): Remove s_copysign-power6 s_copysign-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: consolidate rintAdhemerval Zanella2019-06-122-115/+0
| | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc rint{f} implementations on the generic sysdeps/powerpc/fpu/s_rint{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode, round_to_integer_float, round_mode): Add RINT handling. (reset_fenv_mode): New symbol. * sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation. * sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Fix build failures with current GCCGabriel F. T. Gomes2019-05-306-37/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since GCC commit 271500 (svn), also known as the following commit on the git mirror: commit 61edec870f9fdfb5df3fa4e40f28cbaede28a5b1 Author: amodra <amodra@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Wed May 22 04:34:26 2019 +0000 [RS6000] Don't pass -many to the assembler glibc builds are failing when an assembly implementation does not declare the correct '.machine' directive, or when no such directive is declared at all. For example, when a POWER6 instruction is used, but '.machine power6' is not declared, the assembler will fail with an error similar to the following: ../sysdeps/powerpc/powerpc64/power8/strcmp.S: Assembler messages: 24 ../sysdeps/powerpc/powerpc64/power8/strcmp.S:55: Error: unrecognized opcode: `cmpb' This patch adds '.machine powerN' directives where none existed, as well as it updates '.machine power7' directives on POWER8 files, because the minimum binutils version required to build glibc (binutils 2.25) now provides this machine version. It also adds '-many' to the assembler command used to build tst-set_ppr.c. Tested for powerpc, powerpc64, and powerpc64le, as well as with build-many-glibcs.py for powerpc targets. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: generic nearbyint/nearbyintfAdhemerval Zanella2019-05-282-137/+0
| | | | | | | | | | | | | | This patches consolidates all the powerpc nearbyint{f} implementations on the generic sysdeps/powerpc/fpu/s_nearbyint{f}. * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add NEARBYINT handling. * sysdeps/powerpc/fpu/s_nearbyint.c: New file. * sysdeps/powerpc/fpu/s_nearbyintf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise.
* powerpc: trunc/truncf refactorAdhemerval Zanella2019-05-0917-322/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc trunc{f} implementations on the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add TRUNC handling. (round_mode): Add definition for TRUNC. * sysdeps/powerpc/fpu/s_trunc.c: New file. * sysdeps/powerpc/fpu/s_truncf.c: New file. * sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New file. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c: Likewise. * sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file. * sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64, s_truncf-power5+, and s_truncf-ppc64. (CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64, s_truncf-power5+, and s_truncf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove file. * sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: round/roundf refactorAdhemerval Zanella2019-05-0916-334/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc round{f} implementations on the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add ROUND handling. (round_mode): Add definition for ROUND. (round_to_integer_float): Likewise. * sysdeps/powerpc/fpu/s_round.c: New file. * sysdeps/powerpc/fpu/s_roundf.c: New file. * sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New file. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c: Likewise. * sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file. * sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_round-power5+, s_round-ppc64, s_roundf-power5+, and s_roundf-ppc64. (CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64, s_roundf-power5+, and s_roundf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove file. * sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: floor/floorf refactorAdhemerval Zanella2019-05-0916-304/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc floor{f} implementations on the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add FLOOR option. (round_mode): Add definition for FLOOR. * sysdeps/powerpc/fpu/s_floor.c: New file. * sysdeps/powerpc/fpu/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S: Likewise * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file. * sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64, s_floorf-power5+, and s_floorf-ppc64. (CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New file. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New file. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64, s_floorf-power5+, and s_floorf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: ceil/ceilf refactorAdhemerval Zanella2019-04-2916-309/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc ceil{f} implementations on the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frip instruction) or a generic implementation which uses FP only operations. It adds a generic implementation (round_to_integer.h) which is shared with other rounding to integer routines. The resulting code should be similar in term os performance to previous assembly one. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New function. * sysdeps/powerpc/fpu/round_to_integer.h: New file. * sysdeps/powerpc/fpu/s_ceil.c: Likewise. * sysdeps/powerpc/fpu/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file. * sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ... * here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64, s_ceilf-power5+, and s_ceilf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: Use generic wcsrchr optimizationAdhemerval Zanella2019-04-047-111/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcsrchr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcsrchr.c): New rule. * sysdeps/powerpc/power6/wcsrchr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcsrchr-power6 and wcsrchr-power7. (CFLAGS-wcsrchr-power7.c, CFLAGS-wcsrchr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcsrchr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* powerpc: Use generic wcschr optimizationAdhemerval Zanella2019-04-047-115/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcschr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcschr.c): New rule. * sysdeps/powerpc/power6/wcschr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcschr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcschr-power6 and wcschr-power7. (CFLAGS-wcschr-power7.c, CFLAGS-wcschr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcschr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* powerpc: Use generic wcscpy optimizationAdhemerval Zanella2019-04-047-110/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcscpy optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcscpy.c): New rule. * sysdeps/powerpc/power6/wcscpy.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and wcscpy-power7. (CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcscpy optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* [powerpc] Use __builtin_{mffs,mtfsf}Paul A. Clarke2019-03-291-2/+1
| | | | | | | | | | | | | | | | | | | Replace inline asm uses of the "mffs" and "mtfsf" instructions with the analogous GCC builtins. __builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above. Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use these builtins without restriction. 2019-03-29 Paul A. Clarke <pc@us.ibm.com> * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline asm with builtin. * sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE): Likewise. * sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise. (_GET_SI_FPSCR): Likewise. (_SET_SI_FPSCR): Likewise.
* Refactor hp-timing rtld usageAdhemerval Zanella2019-03-221-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactor how hp-timing is used on loader code for statistics report. The HP_TIMING_AVAIL and HP_SMALL_TIMING_AVAIL are removed and HP_TIMING_INLINE is used instead to check for hp-timing avaliability. For alpha, which only defines HP_SMALL_TIMING_AVAIL, the HP_TIMING_INLINE is set iff for IS_IN(rtld). Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also checked the builds for all afected ABIs. * benchtests/bench-timing.h: Replace HP_TIMING_AVAIL with HP_TIMING_INLINE. * nptl/descr.h: Likewise. * elf/rtld.c (RLTD_TIMING_DECLARE, RTLD_TIMING_NOW, RTLD_TIMING_DIFF, RTLD_TIMING_ACCUM_NT, RTLD_TIMING_SET): Define. (dl_start_final_info, _dl_start_final, dl_main, print_statistics): Abstract hp-timing usage with RTLD_* macros. * sysdeps/alpha/hp-timing.h (HP_TIMING_INLINE): Define iff IS_IN(rtld). (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Remove. * sysdeps/generic/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL, HP_TIMING_NONAVAIL): Likewise. * sysdeps/ia64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/powerpc/powerpc32/power4/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/powerpc/powerpc64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/sparc/sparc32/sparcv9/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/sparc/sparc64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/x86/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Likewise. * sysdeps/generic/hp-timing-common.h: Update comment with HP_TIMING_AVAIL removal.
* Break lines before not after operators, batch 4.Joseph Myers2019-03-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes further coding style issues where code should have broken lines before operators in accordance with the GNU Coding Standards but instead was breaking lines after them. Tested for x86_64, and with build-many-glibcs.py. * stdio-common/vfscanf-internal.c (ARG): Break lines before rather than after operators. * sysdeps/mach/hurd/setitimer.c (timer_thread): Likewise. (setitimer_locked): Likewise. * sysdeps/mach/hurd/sigaction.c (__sigaction): Likewise. * sysdeps/mach/hurd/sigaltstack.c (__sigaltstack): Likewise. * sysdeps/mach/pagecopy.h (PAGE_COPY_FWD): Likewise. * sysdeps/mach/thread_state.h (machine_get_basic_state): Likewise. * sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c (PPC_CPU_SUPPORTED): Likewise. * sysdeps/unix/sysv/linux/alpha/a.out.h (N_TXTOFF): Likewise. * sysdeps/unix/sysv/linux/generic/wordsize-32/overflow.h (stat_overflow): Likewise. (statfs_overflow): Likewise. * sysdeps/unix/sysv/linux/tst-personality.c (do_test): Likewise. * sysdeps/unix/sysv/linux/tst-ttyname.c (eq_ttyname): Likewise. (eq_ttyname_r): Likewise. (run_chroot_tests): Likewise.
* powerpc: Fix build of wcscpy with --disable-multi-archGabriel F. T. Gomes2019-03-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Since the commit commit 81a14439417552324ec6ca71f65ddf8e7cdd51c7 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Feb 5 17:35:12 2019 -0200 wcsmbs: optimize wcscat powerpc64 and powerpc64le builds fail when configured with --disable-multi-arch and --with-cpu=power6 (or newer), due to an undefined reference to __GI___wcscpy. This patch fixes this on sysdeps/powerpc/powerpc64/power6/wcscpy.c, which is only used when multi-arch is disabled. This patch does nothing for the failures on 32-bits powerpc builds, because the file is under the powerpc64 subdirectory, however, powerpc builds were already failing with --disable-multi-arch, with multiple error messages, even before the aforementioned commit. Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled (all pass) and disabled (powerpc still fails as explained above).
* Add more spaces before '('.Joseph Myers2019-02-282-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes more places where a space should have been present before '(' in accordance with the GNU Coding Standards (as with the previous patch, mainly for calls to sizeof). Tested with build-many-glibcs.py. * sysdeps/powerpc/powerpc32/dl-machine.c (__elf_machine_fixup_plt): Use space before '('. (__process_machine_rela): Likewise. * sysdeps/powerpc/powerpc32/register-dump.h (register_dump): Likewise. * sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (TI_BITS): Likewise. * sysdeps/powerpc/powerpc64/register-dump.h (register_dump): Likewise. * sysdeps/powerpc/test-arith.c (union_t): Likewise. (pattern): Likewise. (delta): Likewise. (check_result): Likewise. (check_excepts): Likewise. (check_op): Likewise. (fail_xr): Likewise. * sysdeps/unix/alpha/sysdep.h (syscall_promote): Likewise. * sysdeps/unix/sysv/linux/alpha/a.out.h (AOUTHSZ): Likewise. (SCNHSZ): Likewise. * sysdeps/unix/sysv/linux/hppa/makecontext.c (FRAME_SIZE_BYTES): Likewise. (ARGS): Likewise. (__makecontext): Likewise. * sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (ucontext_t): Likewise.
* wcsmbs: optimize wcscatAdhemerval Zanella2019-02-271-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | This patch rewrites wcscat using wcslen and wcscpy. This is similar to the optimization done on strcat by 6e46de42fe. The strcpy changes are mainly to add the internal alias to avoid PLT calls. Checked on x86_64-linux-gnu and a build against the affected architectures. * include/wchar.h (__wcscpy): New prototype. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c (__wcscpy): Route internal symbol to generic implementation. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy): Add internal __wcscpy alias. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise. * sysdeps/s390/wcscpy.c (wcscpy): Likewise. * sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise. * wcsmbs/wcscpy.c (wcscpy): Add * sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to use generic implementation. * wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
* Add and move fall-through comments in system-specific code.Joseph Myers2019-02-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | This patch fixes -Wimplicit-fallthrough warnings in system-specific code that show up building glibc with -Wextra, by adding fall-through comments, or moving existing such comments to the place required for them to work (immediately before the case label being fallen through). Tested with build-many-glibcs.py. * sysdeps/i386/dl-machine.h (elf_machine_rela): Add fall-through comments. * sysdeps/m68k/m680x0/fpu/s_cexp_template.c (s(__cexp)): Likewise. * sysdeps/m68k/memcopy.h (WORD_COPY_FWD): Likewise. (WORD_COPY_BWD): Likewise. * sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise. * sysdeps/s390/iso-8859-1_cp037_z900.c (TR_LOOP): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_reloc): Move fall-through comment. * sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
* powerpc64le: Remove test for GCC 6.2Gabriel F. T. Gomes2019-02-202-69/+1
| | | | | | | | | | | | | | | | | | | | | | The configure fragment for powerpc64le contains a test for the presence of several compiler builtins and of the __float128 type, which are provided by GCC 6.2 for powerpc64le. Since this configure test was added, the compiler version required to build glibc for powerpc64le was different than that required for the other architectures. Now that glibc requires GCC 6.2 globally (since commit ID 4dcbbc3b28aa), this patch removes the powerpc64le-specific test. Even tough the configure test checks for compiler features rather than compiler version, the intent of the test was to stop build attempts at early stages, if they had been configured with a too old compiler. It was not the intention of the test to detect compiler breakage (such as the removal of the required compiler features in future GCC versions), and glibc is not the place to test for compiler regressions, anyway. Tested for powerpc64le with GCC 6.2 (built with build-many-glibcs.py). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Fix tiny bug in strncmp.cPaul Clarke2019-01-161-1/+1
| | | | | | | | | | | A single underscore was omitted in sysdeps/powerpc/powerpc64/multiarch/strncmp.c, resulting in use of power8 version of strncmp instead of power9 version, with significant performance degradation. * sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Fix #ifdef. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: fix tst-ucontext-ppc64-vscr test for POWER 5/6.Rogerio Alves2019-01-151-2/+2
| | | | | | | | | | | | | | An error "impossible register constraint in 'asm'" was raised on POWER 5 and due to __vector __int128_t being used as operands without passing the option -msvx to gcc. This patch replaces "__vector __int128_t" with "__vector unsigned int" which requires only -maltivec, available since POWER ISA 2.03, and which is already passed to the compiler. * sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c: (do_test): Changed __vector __int128_t to __vector unsigned int. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Fix VSCR position in ucontext (bug 24088)Rogerio Alves2019-01-112-0/+89
| | | | | | | | | | | | | | | This patch fix VSCR position on ucontext. VSCR was read in the wrong position on ucontext structure because it was ignoring the machine endianess. [BZ #24088] * sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (vscr_t): Added ifdef to fix read of VSCR. * sysdeps/powerpc/powerpc64/Makefile [$subdir == stdlib]: Add tst-ucontext-ppc64-vscr.c to test list. * sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c: New test file. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-01354-354/+354
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* powerpc: Add missing CFI register information (bug #23614)Tulio Magno Quites Machado Filho2018-12-123-18/+38
| | | | | | | | | | | | | | Add CFI information about the offset of registers stored in the stack frame. [BZ #23614] * sysdeps/powerpc/powerpc64/addmul_1.S (FUNC): Add CFI offset for registers saved in the stack frame. * sysdeps/powerpc/powerpc64/lshift.S (__mpn_lshift): Likewise. * sysdeps/powerpc/powerpc64/mul_1.S (__mpn_mul_1): Likewise. Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* Use copysign functions not __copysign functions in glibc libm.Joseph Myers2018-09-272-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __copysign functions to call the corresponding copysign names instead, with asm redirection to __copysign when the calls are not inlined (all cases are inlined except for IBM long double for powerpc soft-float / e500v1). This eliminates the need for an inline function defining __copysign in terms of __builtin_copysign. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_BINARY_ARGS): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (copysign): Redirect using MATH_REDIRECT. * sysdeps/alpha/fpu/s_copysign.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/alpha/fpu/s_copysignf.c: Likewise. * sysdeps/ieee754/dbl-64/s_copysign.c: Likewise. * sysdeps/ieee754/float128/s_copysignf128.c: Likewise. * sysdeps/ieee754/flt-32/s_copysignf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_copysignl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/riscv/rvd/s_copysign.c: Likewise. * sysdeps/riscv/rvf/s_copysignf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/generic/math_private_calls.h [!__MATH_DECLARING_LONG_DOUBLE || !NO_LONG_DOUBLE] (__copysign): Do not declare and define as an inline function. * math/divtc3.c (__divtc3): Use copysign functions instead of __copysign variants. * math/multc3.c (__multc3): Likewise. * sysdeps/generic/math-type-macros.h (M_COPYSIGN): Likewise. * sysdeps/ieee754/dbl-64/e_atan2.c (signArctan2): Likewise. * sysdeps/ieee754/dbl-64/e_atanh.c (__ieee754_atanh): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise. (__ieee754_yn): Likewise. * sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise. * sysdeps/ieee754/dbl-64/s_atan.c (__signArctan): Likewise. * sysdeps/ieee754/dbl-64/s_scalbln.c (__scalbln): Likewise. * sysdeps/ieee754/dbl-64/s_scalbn.c (__scalbn): Likewise. * sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Likewise. (__sin): Likewise. * sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c (__nearbyint): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_scalbln.c (__scalbln): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (__scalbn): Likewise. * sysdeps/ieee754/flt-32/e_atanhf.c (__ieee754_atanhf): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise. (__ieee754_ynf): Likewise. * sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise. * sysdeps/ieee754/flt-32/s_scalbnf.c (__scalbnf): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl): Likewise. * sysdeps/ieee754/ldbl-128/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-128/s_scalbnl.c (__scalbnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_fmal.c (__fmal): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise. (__ieee754_ynl) * sysdeps/ieee754/ldbl-96/s_asinhl.c (__asinhl): Likewise. * sysdeps/ieee754/ldbl-96/s_scalblnl.c (__scalblnl): Likewise. * sysdeps/ieee754/ldbl-opt/nldbl-copysign.c (copysignl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Use round functions not __round functions in glibc libm.Joseph Myers2018-09-272-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __round functions to call the corresponding round names instead, with asm redirection to __round when the calls are not inlined. An additional complication arises in sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the result converted to int, gets converted by the compiler to call lroundl in the case of 32-bit long, so resulting in localplt test failures. It's logically correct to let the compiler make such an optimization; an appropriate asm redirection of lroundl to __lroundl is thus added to that file (it's not needed anywhere else). Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_roundf.c: Likewise. * sysdeps/ieee754/dbl-64/s_round.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise. * sysdeps/ieee754/float128/s_roundf128.c: Likewise. * sysdeps/ieee754/flt-32/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise. (round): Redirect to __round. (__roundl): Call round instead of __round. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round): Remove macro. [_ARCH_PWR5X] (__roundf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round functions instead of __round variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to __lroundl. (__ieee754_expl): Call roundl instead of __roundl.
* powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSCAdhemerval Zanella2018-09-211-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls, instead it suspend and resume it when leaving the kernel. The side-effects of the syscall will always remain visible, even if the transaction is aborted. This is an issue when transaction is used along with futex syscall, on pthread_cond_wait for instance, where the futex call might succeed but the transaction is rolled back leading the pthread_cond object in an inconsistent state. Glibc used to prevent it by always aborting a transaction before issuing a syscall. Linux 4.2 also decided to abort active transaction in syscalls which makes the glibc workaround superfluous. Worse, glibc transaction abortion leads to a performance issue on recent kernels where the HTM state is saved/restore lazily (v4.9). By aborting a transaction on every syscalls, regardless whether a transaction has being initiated before, GLIBS makes the kernel always save/restore HTM state (it can not even lazily disable it after a certain number of syscall iterations). Because of this shortcoming, Transactional Lock Elision is just enabled when it has been explicitly set (either by tunables of by a configure switch) and if kernel aborts HTM transactions on syscalls (PPC_FEATURE2_HTM_NOSC). It is reported that using simple benchmark [1], the context-switch is about 5% faster by not issuing a tabort in every syscall in newer kernels. Checked on powerpc64le-linux-gnu with 4.4.0 kernel (Ubuntu 16.04). * NEWS: Add note about new TLE support on powerpc64le. * sysdeps/powerpc/nptl/tcb-offsets.sym (TM_CAPABLE): Remove. * sysdeps/powerpc/nptl/tls.h (tcbhead_t): Rename tm_capable to __ununsed1. (TLS_INIT_TP, TLS_DEFINE_INIT_TP): Remove tm_capable setup. (THREAD_GET_TM_CAPABLE, THREAD_SET_TM_CAPABLE): Remove macros. * sysdeps/powerpc/powerpc32/sysdep.h, sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION_IMPL, ABORT_TRANSACTION): Remove macros. * sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise. * sysdeps/unix/sysv/linux/powerpc/elision-conf.c (elision_init): Set __pthread_force_elision iff PPC_FEATURE2_HTM_NOSC is set. * sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h, sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h sysdeps/unix/sysv/linux/powerpc/syscall.S (ABORT_TRANSACTION): Remove usage. * sysdeps/unix/sysv/linux/powerpc/not-errno.h: Remove file. Reported-by: Breno Leitão <leitao@debian.org>
* Use trunc functions not __trunc functions in glibc libm.Joseph Myers2018-09-202-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.
* Use ceil functions not __ceil functions in glibc libm.Joseph Myers2018-09-172-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* Use floor functions not __floor functions in glibc libm.Joseph Myers2018-09-142-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
* [BZ #20271] Add newlines in __libc_fatal calls.Paul Pluzhnikov2018-08-311-1/+1
|
* powerpc: Remove powerpc specific sinf and cosf optimizationRajalakshmi Srinivasaraghavan2018-08-209-1191/+0
| | | | | | | New generic optimization of sinf and cosf introduced by commit 599cf3976679e1b345307d9c02057f02aa95528f shows improvement compared to powerpc specific assembly version. Hence removing the powerpc assembly versions to make use of generic code.
* powerpc: Rearrange little endian specific filesRajalakshmi Srinivasaraghavan2018-08-168-24/+31
| | | | | | This patch moves little endian specific POWER9 optimization files to sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions only for little endian.
* powerpc64: Always restore TOC on longjmp [BZ #21895]Rogerio Alves2018-07-164-4/+139
| | | | | | | | | | | | | | | | | | | This patch changes longjmp to always restore the TOC pointer (r2 register) to the caller frame on powerpc64 and powerpc64le. This is related to bug 21895 that reports a situation where you have a static longjmp to a shared object file. [BZ #21895] * sysdeps/powerpc/powerpc64/__longjmp-common.S: Remove condition code for restoring r2 in longjmp. * sysdeps/powerpc/powerpc64/Makefile: Added tst-setjmp-bug21895-static to test list. Added rules to build test tst-setjmp-bug21895-static. Added module setjmp-bug21895 and rules to build a shared object from it. * sysdeps/powerpc/powerpc64/setjmp-bug21895.c: New test file. * sysdeps/powerpc/powerpc64/tst-setjmp-bug21895-static.c: New test file. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Fix powerpc64le build of nan-sign tests (bug 23303).Joseph Myers2018-06-181-1/+4
| | | | | | | | | | | | | | | | My recent nan-sign tests fail to build for powerpc64le with GCC 8 because of the special compile / link options needed there for any test using _Float128. This patch arranges for these tests to be handled on powerpc64le similarly to other such tests. Tested with build-many-glibcs.py for powerpc64le. [BZ #23303] * sysdeps/powerpc/powerpc64/le/Makefile (CFLAGS-tst-strtod-nan-sign.c): Add -mfloat128. (CFLAGS-tst-wcstod-nan-sign.c): Likewise. (gnulib-tests): Also add $(f128-loader-link) for tst-strtod-nan-sign abd tst-wcstod-nan-sign.
* Mark _init and _fini as hidden [BZ #23145]H.J. Lu2018-06-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _init and _fini are special functions provided by glibc for linker to define DT_INIT and DT_FINI in executable and shared library. They should never be put in dynamic symbol table. This patch marks them as hidden to remove them from dynamic symbol table. Tested with build-many-glibcs.py. [BZ #23145] * elf/Makefile (tests-special): Add $(objpfx)check-initfini.out. ($(all-built-dso:=.dynsym): New target. (common-generated): Add $(all-built-dso:$(common-objpfx)%=%.dynsym). ($(objpfx)check-initfini.out): New target. (generated): Add check-initfini.out. * scripts/check-initfini.awk: New file. * sysdeps/aarch64/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/alpha/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/arm/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/hppa/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/i386/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/ia64/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/m68k/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/microblaze/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/mips/mips32/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/mips/mips64/n32/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/mips/mips64/n64/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/nios2/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/powerpc/powerpc32/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/powerpc/powerpc64/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/s390/s390-32/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/s390/s390-64/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/sh/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/sparc/crti.S (_init): Mark as hidden. (_fini): Likewise. * sysdeps/x86_64/crti.S (_init): Mark as hidden. (_fini): Likewise.
* powerpc64le: Fix TFtype in sqrtf128 when using -mabi=ieeelongdoubleTulio Magno Quites Machado Filho2018-06-061-3/+8
| | | | | | | | | | | | | | | | When building with -mlong-double-128 or -mabi=ibmlongdouble, TFtype represents the IBM 128-bit extended floating point type, while KFtype represents the IEEE 128-bit floating point type. The soft float implementation of e_sqrtf128 had to redefine TFtype and TF in order to workaround this issue. However, this behavior changes when -mabi=ieeelongdouble is used and the macros are not necessary. * sysdeps/powerpc/powerpc64/le/fpu/e_sqrtf128.c [__HAVE_FLOAT128_UNLIKE_LDBL] (TFtype, TF): Restrict TFtype and TF redirection to KFtype and KF only when the default long double type is not the IEEE 128-bit floating point type. Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Add multiarch sqrtf128 for ppc64leRajalakshmi Srinivasaraghavan2018-05-304-0/+107
| | | | | | | This patch creates ifunc for sqrtf128() to make use of new xssqrtqp instruction for POWER9 when --enable-multi-arch and --with-cpu=power8 options are used on power9 system. This is achieved by explicitly adding -mcpu=power9 flag for sqrtf128-power9.
* powerpc: Move around math-related ImpliesTulio Magno Quites Machado Filho2018-05-242-0/+10
| | | | | | | | | | | | | | | Currently, powerpc, powerpc64, and powerpc64le imply the same set of subdirectories from sysdeps/ieee754: flt-32, dbl-64, ldbl-128ibm, and ldbl-opt. In preparation for the transition of the long double format - from IBM Extended Precision to IEEE 754 128-bits floating-point - on powerpc64le, this patch splits the shared Implies file into three separate files (one for each of the powerpc architectures), without changing their contents. Future patches will modify powerpc64le. * sysdeps/powerpc/Implies: Removed. Previous contents copied to... * sysdeps/powerpc/powerpc32/Implies-after: ... here. * sysdeps/powerpc/powerpc64/be/Implies-after: ... here. * sysdeps/powerpc/powerpc64/le/Implies-before: ... and here.