about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* powerpc: refactor logb{f,l}Adhemerval Zanella2019-07-081-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The power7 logb implementation does not show a performance gain on ISA 2.07+ chips with faster floating-point to GRP instructions (currently POWER8 and POWER9). This patch moves the POWER7 implementation to generic one and enables it for POWER7. It also add some cleanup to use inline floating-point number instead of define them using static const. The performance difference is for POWER9: - Without patch: "logb": { "subnormal": { "duration": 4.99202e+09, "iterations": 8.83662e+08, "max": 75.194, "min": 5.501, "mean": 5.64925 }, "normal": { "duration": 4.97063e+09, "iterations": 9.97094e+08, "max": 46.489, "min": 4.956, "mean": 4.98512 } } - With patch: "logb": { "subnormal": { "duration": 4.97226e+09, "iterations": 9.92036e+08, "max": 77.209, "min": 4.892, "mean": 5.01218 }, "normal": { "duration": 4.96192e+09, "iterations": 1.07545e+09, "max": 12.361, "min": 4.593, "mean": 4.61382 } } The ifunc implementation is also enabled only for powerpc64. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power7/fpu/s_logb.c: Move to ... * sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ... * sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ... * sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_log* objects. (CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c, CFLAGS-s_logb-power7.c): New fule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Refactor modf{f}Adhemerval Zanella2019-07-081-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The modf{f} optimization is not an optimization for ISA 2.07+. This patch move the IFUNC for powerpc64 only, move the power5+ to generic location, and include the generic implementation for ISA 2.07+. The performance changes are based on modf benchtests: * POWER9 - ppc64 "modf": { "": { "duration": 4.97057e+09, "iterations": 1.00688e+09, "max": 28.76, "min": 4.912, "mean": 4.9366 } } * POWER9 - power5+ "modf": { "": { "duration": 4.98291e+09, "iterations": 9.32818e+08, "max": 15.058, "min": 5.107, "mean": 5.34178 } } * POWER8 - ppc64 "modf": { "": { "duration": 5.05329e+09, "iterations": 8.38814e+08, "max": 518.051, "min": 5.79, "mean": 6.02433 } } * POWER8 - power5+ "modf": { "": { "duration": 5.05573e+09, "iterations": 8.35254e+08, "max": 63.141, "min": 5.873, "mean": 6.05293 } } * POWER7 - ppc64 "modf": { "": { "duration": 4.89818e+09, "iterations": 1.08408e+09, "max": 57.556, "min": 3.953, "mean": 4.51827 } } * POWER7 - power5+ "modf": { "": { "duration": 4.83789e+09, "iterations": 1.33409e+09, "max": 46.608, "min": 2.224, "mean": 3.62636 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ... * sysdeps/powerpc/fpu/s_modf.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ... * sysdeps/powerpc/fpu/s_modff.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c: Adjust include. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls, sysdep_routines): Add s_modf* objects. (CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c, CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: hypot refactor and optimizationAdhemerval Zanella2019-07-081-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc hypot is slight optimized by: - Commit 8df4e219e43, both isnan and isinf are always inlined and thus the check TEST_INF_NAN does not make sense anymore. The generic check for POWER7 should be faster on all powerpc configuration. - The redundant check 'y > two60factor && (x / y) > two60' is removed. Both changes leads to unrequired ifunc especialization for power7 and thus they are removed. Finally The code is also cleanup a bit by inlining the constants floating points. The performance changes using the hypot benchtests are: - POWER9 without patch: "hypot": { "overflow": { "duration": 4.98585e+09, "iterations": 4.84932e+08, "max": 46.551, "min": 10.229, "mean": 10.2815 }, "higher_two500": { "duration": 5.00192e+09, "iterations": 4.24843e+08, "max": 33.319, "min": 11.606, "mean": 11.7736 }, "subnormal": { "duration": 5.0075e+09, "iterations": 4.06792e+08, "max": 22.178, "min": 12.15, "mean": 12.3097 }, "less_two500": { "duration": 5.00685e+09, "iterations": 4.08772e+08, "max": 22.784, "min": 12.052, "mean": 12.2485 }, "default": { "duration": 5.06002e+09, "iterations": 4.09894e+08, "max": 20.648, "min": 11.874, "mean": 12.3447 } } - POWER9 with patch: "hypot": { "overflow": { "duration": 4.91848e+09, "iterations": 7.28039e+08, "max": 47.958, "min": 6.436, "mean": 6.75579 }, "higher_two500": { "duration": 4.9359e+09, "iterations": 6.63376e+08, "max": 20.783, "min": 7.321, "mean": 7.44057 }, "subnormal": { "duration": 4.9479e+09, "iterations": 6.19772e+08, "max": 18.856, "min": 7.817, "mean": 7.98341 }, "less_two500": { "duration": 4.94275e+09, "iterations": 6.3889e+08, "max": 17.452, "min": 7.597, "mean": 7.73647 }, "default": { "duration": 5.03645e+09, "iterations": 5.70718e+08, "max": 18.904, "min": 8.55, "mean": 8.82476 } } - POWER7 without patch "hypot": { "overflow": { "duration": 4.86637e+09, "iterations": 6.43196e+08, "max": 53.958, "min": 7.328, "mean": 7.56592 }, "higher_two500": { "duration": 4.99842e+09, "iterations": 3.11012e+08, "max": 78.227, "min": 15.696, "mean": 16.0715 }, "subnormal": { "duration": 4.99841e+09, "iterations": 3.08935e+08, "max": 51.392, "min": 15.983, "mean": 16.1795 }, "less_two500": { "duration": 5.00108e+09, "iterations": 2.99464e+08, "max": 73.247, "min": 16.416, "mean": 16.7001 }, "default": { "duration": 5.04645e+09, "iterations": 3.52608e+08, "max": 70.073, "min": 13.38, "mean": 14.3118 } } - POWER7 with patch "hypot": { "overflow": { "duration": 4.80785e+09, "iterations": 8.00001e+08, "max": 66.262, "min": 5.888, "mean": 6.00981 }, "higher_two500": { "duration": 4.9859e+09, "iterations": 3.39449e+08, "max": 5148.44, "min": 14.539, "mean": 14.6882 }, "subnormal": { "duration": 4.9905e+09, "iterations": 3.28874e+08, "max": 64.905, "min": 14.971, "mean": 15.1745 }, "less_two500": { "duration": 4.99494e+09, "iterations": 3.19755e+08, "max": 103.696, "min": 14.972, "mean": 15.6211 }, "default": { "duration": 5.03951e+09, "iterations": 4.02502e+08, "max": 61.008, "min": 12.368, "mean": 12.5205 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022, twoM500, twoM600, two60factor, pdnum): Remove. (TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro. (__ieee754_hypot): Replace static variables with inline definition, remove ununsed branches. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_hypot-* objects. (CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Use generic e_expfAdhemerval Zanella2019-06-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic implementation is faster on both power8 and power9: POWER9: - sysdeps/ieee754/flt-32/e_expf.c "expf": { "workload-spec2017.wrf": { "duration": 5.1236e+09, "iterations": 7.53344e+08, "reciprocal-throughput": 5.9436, "latency": 7.65869, "max-throughput": 1.68248e+08, "min-throughput": 1.30571e+08 } } - sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S "expf": { "workload-spec2017.wrf": { "duration": 5.14429e+09, "iterations": 5.29248e+08, "reciprocal-throughput": 8.05372, "latency": 11.3863, "max-throughput": 1.24166e+08, "min-throughput": 8.78249e+07 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: Refactor powerpc64 lround/lroundf/llround/llroundfAdhemerval Zanella2019-06-171-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc {l}lround{f} implementations on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llround-power8, s_llround-power6x, s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64. (CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c, CFLAGS-s_llround-power5+.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/Makefile [$(subdir) == math] (CFLAGS-s_llround.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llround-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintfAdhemerval Zanella2019-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/fpu/s_llrint{f}. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and s_llrint-ppc64. (CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llrint-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized finiteAdhemerval Zanella2019-06-121-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc finite optimization do not show much gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc64-linux-gnu-power7 (--with-cpu=power7): - Generic sysdeps/ieee754 implementation: "isfinite": { "": { "duration": 5.0082e+09, "iterations": 2.45299e+09, "max": 43.824, "min": 2.008, "mean": 2.04167 }, "INF": { "duration": 4.66554e+09, "iterations": 2.28288e+09, "max": 35.73, "min": 2.008, "mean": 2.04371 }, "NAN": { "duration": 4.66274e+09, "iterations": 2.28716e+09, "max": 34.161, "min": 2.009, "mean": 2.03866 } } - power7 optimized one: "isfinite": { "": { "duration": 4.99111e+09, "iterations": 2.65566e+09, "max": 25.015, "min": 1.716, "mean": 1.87942 }, "INF": { "duration": 4.6783e+09, "iterations": 2.0999e+09, "max": 35.264, "min": 1.868, "mean": 2.22787 }, "NAN": { "duration": 4.67915e+09, "iterations": 2.08678e+09, "max": 38.099, "min": 1.869, "mean": 2.24228 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_finite* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_finite* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized isinfAdhemerval Zanella2019-06-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc isinf optimizations onyl adds complexity: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input pattern and branch implementation for INF and denormal that does: return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000) Although it does show slight better latency than generic algorithm (as below), it is only for power7 and requires it to override it for power8. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized isnanAdhemerval Zanella2019-06-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc isnan optimizations are not really a gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power5, power6, and power6x are just micro-optimization to improve the Load-Hit-Store hazards from floating-point to general register transfer, and current GCC already has support to minimize it by inserting either extra nops or group dispatch instructions. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc-linux-gnu-power4 (which uses the hp-timing support on benchtests): - Generic sysdeps/ieee754 implementation: "isnan": { "": { "duration": 4.98415e+09, "iterations": 2.34516e+09, "max": 45.925, "min": 2.052, "mean": 2.12529 }, "INF": { "duration": 4.74057e+09, "iterations": 1.69761e+09, "max": 91.01, "min": 2.052, "mean": 2.79249 }, "NAN": { "duration": 4.74071e+09, "iterations": 1.68768e+09, "max": 282.343, "min": 2.052, "mean": 2.809 } } - power7 optimized one: $ ./testrun.sh benchtests/bench-isnan "isnan": { "": { "duration": 4.96842e+09, "iterations": 2.56297e+09, "max": 50.048, "min": 1.872, "mean": 1.93854 }, "INF": { "duration": 4.76648e+09, "iterations": 1.54213e+09, "max": 373.408, "min": 2.661, "mean": 3.09084 }, "NAN": { "duration": 4.76845e+09, "iterations": 1.54515e+09, "max": 51.016, "min": 2.736, "mean": 3.08607 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The generic implementation requires getting the floating point status, disable the invalid operation bit, and restore the floating-point status. Each operation is costly and requires flushing the FP pipeline. Using the same scenarion for the previous analysis: "isnan": { "": { "duration": 5.08284e+09, "iterations": 6.2898e+08, "max": 41.844, "min": 8.057, "mean": 8.08108 }, "INF": { "duration": 4.97904e+09, "iterations": 6.16176e+08, "max": 39.661, "min": 8.057, "mean": 8.08055 }, "NAN": { "duration": 4.98695e+09, "iterations": 5.95866e+08, "max": 29.728, "min": 8.345, "mean": 8.36925 } } - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_isnan.c: Remove file. * sysdeps/powerpc/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S: Remove file * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: copysign cleanupAdhemerval Zanella2019-06-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC always expand copysign{f} for all possible cpus, so calling the libm is only done if user explicitly states to disable the builtin (which is done usually not for performance reason). So to provide ifunc variant for copysign is just unrequired complexity, since libm will be called on non-performance critical code. This patch removes both powerpc32 and powerpc64 ifunc variants and consolidates the powerpc implementation on sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_copysign.c: New file. * sysdeps/powerpc/fpu/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and s_copysign-ppc32. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls): Remove s_copysign-power6 s_copysign-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: trunc/truncf refactorAdhemerval Zanella2019-05-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc trunc{f} implementations on the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add TRUNC handling. (round_mode): Add definition for TRUNC. * sysdeps/powerpc/fpu/s_trunc.c: New file. * sysdeps/powerpc/fpu/s_truncf.c: New file. * sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New file. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c: Likewise. * sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file. * sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64, s_truncf-power5+, and s_truncf-ppc64. (CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64, s_truncf-power5+, and s_truncf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove file. * sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: round/roundf refactorAdhemerval Zanella2019-05-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc round{f} implementations on the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add ROUND handling. (round_mode): Add definition for ROUND. (round_to_integer_float): Likewise. * sysdeps/powerpc/fpu/s_round.c: New file. * sysdeps/powerpc/fpu/s_roundf.c: New file. * sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New file. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c: Likewise. * sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c: Likewise. * sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file. * sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_round-power5+, s_round-ppc64, s_roundf-power5+, and s_roundf-ppc64. (CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New file. * sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64, s_roundf-power5+, and s_roundf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove file. * sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise. * sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: floor/floorf refactorAdhemerval Zanella2019-05-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc floor{f} implementations on the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frim instruction) or a generic implementation which uses FP only operations. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add FLOOR option. (round_mode): Add definition for FLOOR. * sysdeps/powerpc/fpu/s_floor.c: New file. * sysdeps/powerpc/fpu/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S: Likewise * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file. * sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64, s_floorf-power5+, and s_floorf-ppc64. (CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New file. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New file. * sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64, s_floorf-power5+, and s_floorf-ppc64. * sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: ceil/ceilf refactorAdhemerval Zanella2019-04-291-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc ceil{f} implementations on the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation uses either the compiler builts for ISA 2.03+ (which generates the frip instruction) or a generic implementation which uses FP only operations. It adds a generic implementation (round_to_integer.h) which is shared with other rounding to integer routines. The resulting code should be similar in term os performance to previous assembly one. The IFUNC organization for powerpc64 is also change to be enabled only for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not require the fallback generic implementation). Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New function. * sysdeps/powerpc/fpu/round_to_integer.h: New file. * sysdeps/powerpc/fpu/s_ceil.c: Likewise. * sysdeps/powerpc/fpu/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file. * sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ... * here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64, s_ceilf-power5+, and s_ceilf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: Remove powerpc specific sinf and cosf optimizationRajalakshmi Srinivasaraghavan2018-08-201-2/+0
| | | | | | | New generic optimization of sinf and cosf introduced by commit 599cf3976679e1b345307d9c02057f02aa95528f shows improvement compared to powerpc specific assembly version. Hence removing the powerpc assembly versions to make use of generic code.
* [BZ #21745] powerpc: build some IFUNC math functions for libc and libmTulio Magno Quites Machado Filho2017-09-151-17/+19
| | | | | | | | | | | | | | | | | | | | | | | Some math functions have to be distributed in libc because they're required by printf. libc and libm require their own builds of these functions, e.g. libc functions have to call __stack_chk_fail_local in order to bypass the PLT, while libm functions have to call __stack_chk_fail. While math/Makefile treat the generic cases, i.e. s_isinff, the multiarch Makefile has to treat its own files, i.e. s_isinff-ppc64. [BZ #21745] * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: [$(subdir) = math] (sysdep_calls): New variable. Has the previous contents of sysdep_routines, but re-sorted.. [$(subdir) = math] (sysdep_routines): Re-use the contents from sysdep_calls. [$(subdir) = math] (libm-sysdep_routines): Remove the functions defined in sysdep_calls and replace by the respective m_* names. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: (compat_symbol): Undefine to avoid duplicated compat symbols in libc.
* powerpc: Add optimized version of [l]lroundfRajalakshmi Srinivasaraghavan2017-06-231-1/+1
| | | | | This patch makes use of optimized double version of llround for single precision as both the versions return [long] long type.
* powerpc: Add a POWER8-optimized version of cosf()Paul Clarke2017-05-171-1/+2
| | | | | | | | | | | | | This implementation is based on the one already used at sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf-power8.S. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile [$(subdir) = math] (libm-sysdep_routines): Add s_cosf-power8 and s_cosf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-power8.S: New file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf.c: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_cosf.S: Likewise.
* ppc: Fix modf (sNaN) for pre-POWER5+ CPU (bug 20240).Aurelien Jarno2016-07-081-0/+5
| | | | | | | | | | | | | | | | Commit a6a4395d fixed modf implementation by compiling s_modf.c and s_modff.c with -fsignaling-nans. However these files are also included from the pre-POWER5+ implementation, and thus these files should also be compiled with -fsignaling-nans. Changelog: [BZ #20240] * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc32.c): New variable. (CFLAGS-s_modff-ppc32.c): Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc64.c): Likewise. (CFLAGS-s_modff-ppc64.c): Likewise.
* powerpc: Add a POWER8-optimized version of sinf()Anton Blanchard2016-06-301-1/+2
| | | | | This uses the implementation of sinf() in sysdeps/x86_64/fpu/s_sinf.S as inspiration.
* powerpc: Add a POWER8-optimized version of expf()Tulio Magno Quites Machado Filho2016-06-301-1/+2
| | | | | | | | This implementation is based on the one already used at sysdeps/x86_64/fpu/e_expf.S. This implementation improves the performance by ~14% on average in synthetic benchmarks at the cost of decreasing accuracy to 1 ULP.
* PowerPC: llround/llroundf POWER8 optimizationAdhemerval Zanella2014-02-271-1/+1
| | | | | | This patch add a optimized llround/llroundf implementation for POWER8 using the new Move From VSR Doubleword instruction to gains some cycles from FP to GRP register move.
* PowerPC: llrint/llrintf POWER8 optimizationAdhemerval Zanella2014-02-271-1/+2
| | | | | | This patch add a optimized llrint/llrintf implementation for POWER8 using the new Move From VSR Doubleword instruction to gains some cycles from FP to GRP register move.
* PowerPC: Optimized finite/finitef for POWER8Adhemerval Zanella2014-02-271-2/+2
| | | | | | This patch add a optimized finite/finitef implementation for POWER8 using the new Move From VSR Doubleword instruction to gains some cycles from FP to GRP register move.
* PowerPC: Optimized isinf/isinff for POWER8Adhemerval Zanella2014-02-271-2/+3
| | | | | | This patch add a optimized isinf/isinff implementation for POWER8 using the new Move From VSR Doubleword instruction to gains some cycles from FP to GRP register move.
* PowerPC: Optimized isnan/isnanf for POWER8Adhemerval Zanella2014-02-271-2/+3
| | | | | | This patch add a optimized isnan/isnanf implementation for POWER8 using the new Move From VSR Doubleword instruction to gains some cycles from FP to GRP register move.
* PowerPC: multiarch hypot/hypotf for PowerPC64Adhemerval Zanella2013-12-131-1/+4
|
* PowerPC: multiarch modf/modff for PowerPC64Adhemerval Zanella2013-12-131-2/+6
|
* PowerPC: multiarch logb/logbl/logbf for PowerPC64Adhemerval Zanella2013-12-131-1/+7
|
* PowerPC: multiarch isinf/isinff for PowerPC64Adhemerval Zanella2013-12-131-2/+4
|
* PowerPC: multiarch finite/finitef for PowerPC64Adhemerval Zanella2013-12-131-2/+4
|
* PowerPC: multiarch llrint/lrint for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch copysign/copysignf for PowerPC64Adhemerval Zanella2013-12-131-2/+4
|
* PowerPC: multiarch trunc/truncf for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch round/roundf for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch floor/floorf for PowerPC64Adhemerval Zanella2013-12-131-1/+3
|
* PowerPC: multiarch ceil/ceilf for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch llround/lround for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch isnan/isnanf for PowerPC64Adhemerval Zanella2013-12-131-0/+7