about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-02265-265/+265
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* powerpc: Runtime selection between sc and scv for syscallsMatheus Castanho2020-12-301-6/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux kernel v5.9 added support for system calls using the scv instruction for POWER9 and later. The new codepath provides better performance (see below) if compared to using sc. For the foreseeable future, both sc and scv mechanisms will co-exist, so this patch enables glibc to do a runtime check and use scv when it is available. Before issuing the system call to the kernel, we check hwcap2 in the TCB for PPC_FEATURE2_SCV to see if scv is supported by the kernel. If not, we fallback to sc and keep the old behavior. The kernel implements a different error return convention for scv, so when returning from a system call we need to handle the return value differently depending on the instruction we used to enter the kernel. For syscalls implemented in ASM, entry and exit are implemented by different macros (PSEUDO and PSEUDO_RET, resp.), which may be used in sequence (e.g. for templated syscalls) or with other instructions in between (e.g. clone). To avoid accessing the TCB a second time on PSEUDO_RET to check which instruction we used, the value read from hwcap2 is cached on a non-volatile register. This is not needed when using INTERNAL_SYSCALL macro, since entry and exit are bundled into the same inline asm directive. The dynamic loader may issue syscalls before the TCB has been setup so it always uses sc with no extra checks. For the static case, there is no compile-time way to determine if we are inside startup code, so we also check the value of the thread pointer before effectively accessing the TCB. For such situations in which the availability of scv cannot be determined, sc is always used. Support for scv in syscalls implemented in their own ASM file (clone and vfork) will be added later. For now simply use sc as before. Average performance over 1M calls for each syscall "type": - stat: C wrapper calling INTERNAL_SYSCALL - getpid: templated ASM syscall - syscall: call to gettid using syscall function Standard: stat : 1.573445 us / ~3619 cycles getpid : 0.164986 us / ~379 cycles syscall : 0.162743 us / ~374 cycles With scv: stat : 1.537049 us / ~3535 cycles <~ -84 cycles / -2.32% getpid : 0.109923 us / ~253 cycles <~ -126 cycles / -33.25% syscall : 0.116410 us / ~268 cycles <~ -106 cycles / -28.34% Tested on powerpc, powerpc64, powerpc64le (with and without scv) Tested-by: Lucas A. M. Magalhães <lamm@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: Add glibc-hwcaps supportFlorian Weimer2020-12-043-0/+128
| | | | | The "power10" and "power9" subdirectories are selected in a way that matches the -mcpu=power10 and -mcpu=power9 options of GCC.
* powerpc64le: ifunc select *f128 routines in multiarch modePaul E. Murphy2020-11-3016-197/+817
| | | | | | | | | | | | | | | | | | | | | | | | | Programatically generate simple wrappers for interesting libm *f128 objects. Selected functions are transcendental functions or those with trivial compiler builtins. This can result in a 2-3x speedup (e.g logf128 and expf128). A second set of implementation files are generated which include the first implementation encountered along the search path. This usually works, except when a wrapper is overriden and makefile search order slightly diverges from include order. Likewise, wrapper object files are created for each generated file. These hold the ifunc selection routines which export ABI. Next, several shared headers are intercepted to control renaming of asm function redirects are used first, and sometimes macro renames if the former is impractical. Notably, if the request machine supports hardware IEEE128 (i.e POWER9 and newer) this ifunc machinery is disabled. Likewise existing ifunc support for float128 is consolidated into this (e.g sqrtf128 and fmaf128). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Eliminate UP macro conditionalsFlorian Weimer2020-11-131-3/+1
| | | | | | The macro is never defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Add optimized stpncpy for POWER9Raphael M Zinsly2020-11-126-2/+135
| | | | | | | Add stpncpy support into the POWER9 strncpy. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Add optimized strncpy for POWER9Raphael M Zinsly2020-11-125-1/+391
| | | | | | | | Similar to the strcpy P9 optimization, this version uses VSX to improve performance. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: fix ifunc implementation list for POWER9 strlen and stpcpyRaphael Moreira Zinsly2020-09-171-2/+2
| | | | | __strlen_power9 and __stpcpy_power9 were added to their ifunc lists using the wrong function names.
* powerpc64le: guarantee a .gnu.attributes section [BZ #26220]Paul E. Murphy2020-07-211-0/+8
| | | | | | | | | | Upstream GCC 11 development is now building the ibm128 runtime support (in libgcc) without a .gnu.attributes section on ppc64le. Ensure we have one to replace by building one ibm128 file in libc and libm with attributes. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Fix calls when r2 is not used [BZ #26173]Tulio Magno Quites Machado Filho2020-07-105-3/+48
| | | | | | | | | Teach the linker that __mcount_internal, __sigjmp_save_symbol, __syscall_error and __GI_exit do not use r2, so that it does not need to recover r2 after the call. Test at configure time if the assembler supports @notoc and define USE_PPC64_NOTOC.
* powerpc: Add support for POWER10Tulio Magno Quites Machado Filho2020-06-298-0/+10
| | | | | | | | 1. Add the directories to hold POWER10 files. 2. Add support to select POWER10 libraries based on AT_PLATFORM. 3. Let submachine=power10 be set automatically.
* powerpc64le: refactor e_sqrtf128.cPaul E. Murphy2020-06-162-39/+7
| | | | | Combine both implementations into a single file to allow building twice with appropriate multiarch support when possible.
* powerpc64le: add optimized strlen for P9Paul E. Murphy2020-06-056-1/+226
| | | | | | | | | | | | | | | | This started as a trivial change to Anton's rawmemchr. I got carried away. This is a hybrid between P8's asympotically faster 64B checks with extremely efficient small string checks e.g <64B (and sometimes a little bit more depending on alignment). The second trick is to align to 64B by running a 48B checking loop 16B at a time until we naturally align to 64B (i.e checking 48/96/144 bytes/iteration based on the alignment after the first 5 comparisons). This allieviates the need to check page boundaries. Finally, explicly use the P7 strlen with the runtime loader when building P9. We need to be cautious about vector/vsx extensions here on P9 only builds.
* powerpc64le: use common fmaf128 implementationPaul E. Murphy2020-06-052-37/+3
| | | | | | | | | This defines the macro such that it should behave best on all supported powerpc targets. Likewise, this allows us to remove the ppc64le specific s_fmaf128.c. I have verified powerpc64le multiarch and powerpc64le power9 no-multiarch builds continue to generate optimize fmaf128.
* powerpc: Optimized rawmemchr for POWER9Anton Blanchard2020-05-185-3/+145
| | | | | | | | | | | This version uses vector instructions and is up to 60% faster on medium matches and up to 90% faster on long matches, compared to the POWER7 version. A few examples: __rawmemchr_power9 __rawmemchr_power7 Length 32, alignment 0: 2.27566 3.77765 Length 64, alignment 2: 2.46231 3.51064 Length 1024, alignment 0: 17.3059 32.6678
* powerpc: Optimized stpcpy for POWER9Anton Blanchard via Libc-alpha2020-05-186-21/+123
| | | | | | | | | | Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on small strings and up to 90% faster on long relatively unaligned strings, compared to the POWER8 version. A few examples: __stpcpy_power9 __stpcpy_power8 Length 20, alignments in bytes 4/ 4: 2.58246 4.8788 Length 1024, alignments in bytes 1/ 6: 24.8186 47.8528
* powerpc: Optimized strcpy for POWER9Anton Blanchard via Libc-alpha2020-05-185-1/+182
| | | | | | | | | | This version uses VSX store vector with length instructions and is significantly faster on small strings and relatively unaligned large strings, compared to the POWER8 version. A few examples: __strcpy_power9 __strcpy_power8 Length 16, alignments in bytes 0/ 0: 2.52454 4.62695 Length 412, alignments in bytes 4/ 0: 11.6 22.9185
* powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]Paul E. Murphy2020-05-041-0/+2
| | | | | | | | | | | | | | | | strcmp is used while resolving PLT references. Vector registers should not be used during this. The P9 strcmp makes heavy use of vector registers, so it should be avoided in rtld. This prevents quiet vector register corruption when glibc is configured with --disable-multi-arch and --with-cpu=power9. This can be seen with test-float64x-compat_totalordermag during the first call into totalordermagf64x@GLIBC_2.27. Add a guard to fallback to the power8 implementation when building power9 strcmp for libraries other than libc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64le: Enable support for IEEE long doubleGabriel F. T. Gomes2020-04-302-0/+5
| | | | | | | | | | | | | | On platforms where long double may have two different formats, i.e.: the same format as double (64-bits) or something else (128-bits), building with -mlong-double-128 is the default and function calls in the user program match the name of the function in Glibc. When building with -mlong-double-64, Glibc installed headers redirect such calls to the appropriate function. Likewise, the internals of glibc are now built against IEEE long double. However, the only (minimally) notable usage of long double is difftime. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: blacklist broken GCC compilers (e.g GCC 7.5.0)Paul E. Murphy2020-04-302-0/+42
| | | | | | | | | | | GCC 7.5.0 (PR94200) will refuse to compile if both -mabi=% and -mlong-double-128 are passed on the command line. Surprisingly, it will work happily if the latter is not. For the sake of maintaining status quo, test for and blacklist such compilers. Tested with a GCC 8.3.1 and GCC 7.5.0 compiler for ppc64le. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: bump binutils version requirement to >= 2.26Paul E. Murphy2020-04-302-0/+69
| | | | | | | This is a small step up from 2.25 which brings in support for rewriting the .gnu.attributes section of libc/libm.so. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: raise GCC requirement to 7.4 for long double transitionPaul E. Murphy2020-04-302-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add compiler feature tests to ensure we can build ieee128 long double. These test for -mabi=ieeelongdouble, -mno-gnu-attribute, and -Wno-psabi. Likewise, verify some compiler bugs have been addressed. These aren't helpful for building glibc, but may cause test failures when testing the new long double. See notes below from Raji. On powerpc64le, some older compiler versions give error for the function signbit() for 128-bit floating point types. This is fixed by PR83862 in gcc 8.0 and backported to gcc6 and gcc7. This patch adds a test to check compiler version to avoid compiler errors during make check. Likewise, test for -mno-gnu-attribute support which was On powerpc64le, a few files are built on IEEE long double mode (-mabi=ieeelongdouble), whereas most are built on IBM long double mode (-mabi=ibmlongdouble, the default for -mlong-double-128). Since binutils 2.31, linking object files with different long double modes causes errors similar to: ld: libc_pic.a(s_isinfl.os) uses IBM long double, libc_pic.a(ieee128-qefgcvt.os) uses IEEE long double. collect2: error: ld returned 1 exit status make[2]: *** [../Makerules:649: libc_pic.os] Error 1 The warnings are fair and correct, but in order for glibc to have support for both long double modes on powerpc64le, they have to be ignored. This can be accomplished with the use of -mno-gnu-attribute option when building the few files that require IEEE long double mode. However, -mno-gnu-attribute is not available in GCC 6, the minimum version required to build glibc, so this patch adds a test for this feature in powerpc64le builds, and fails early if it's not available. Co-Authored-By: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> Co-Authored-By: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: enforce non-specific long double in .gnu.attributes sectionPaul E. Murphy2020-04-062-1/+56
| | | | | | | | | | | | | | | | | | | | | | | We turn off this feature to avoid polluting our shared libary with a specific value. However, static libgcc is not under our control, and has enabled this for ibm128 routines. This pollutes the resulting shared libraries with it. Attach a post-linking hook to replace this section with one crafted as hard-float + indeterminate ldbl. This allows IEEE ldbl users to avoid having to disable the gnu attributes feature which should protect them from linking ibm ldbl libraries using the gnu attributes feature. Currently, this only replaces libc and libm which support both ldbl formats and rely on application code to explicitly determine which is to be used. Strictly speaking, the section could be deleted with minimal lost value. However correctly set attributes could prove useful for some future change, and similarly missing attributes. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: workaround ieee long double / _Float128 stdc++ bugPaul E. Murphy2020-04-061-0/+11
| | | | | | | | | | -mabi=ieeelongdouble triggers the stdc++ libraries _Float128 support, which then breaks if algorithm is included. For now, explicitly disable _Float128 for such tests. I have opened up GCC BZ 94080 to track this. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: Enforce -mabi=ibmlongdouble when -mfloat128 usedPaul E. Murphy2020-04-062-43/+53
| | | | | | | | | | | | | | | | | | I have observed a bug on 7.4.0 whereby __mulkc3 calls are swapped with __multc3 depending on ABI selection. For the sake of being overly cautious, build all _Float128 files with ibm128 to workaround these compilers. This has been noted in GCC BZ 84914, and will not be fixed for GCC 7. Likewise, non-math files built with _Float128 are assumed to have ibm long double. Explicilty preserve this assumption. Finally, add some bootstrapping code to avoid applying these options until IEEE long double is enabled as they require GCC 7 and above. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le/multiarch: don't generate strong aliases for fmaf128-ppc64Paul E. Murphy2020-04-061-0/+2
| | | | | This prevents generating a second alias for __fmaieee128 when compiling with ldouble == ieee128 redirects.
* powerpc: Add support for fmaf128() in hardwareRaphael Moreira Zinsly2020-03-305-1/+127
| | | | | | | | Adds a POWER9 version of fmaf128 that uses the xsmaddqp instruction. Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc64: apply -mabi=ibmlongdouble to special filesPaul E. Murphy2020-03-253-2/+13
| | | | | | | | | | Some of these files depend on the avoidance of using the various register sets of POWER. When enabling the IEEE 128 long double, we must be sure to disable this ABI as some compilers will refuse to compile if -mno-vsx and -mabi=ieeelongdouble are both present. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: add -mno-gnu-attribute to *f128 objects and difftimePaul E. Murphy2020-03-252-9/+16
| | | | | | | | | | | | | In practice, this flag should be applied globally, but it makes a good sanity check to ensure ibm128 and ieee128 long double files are not getting mismatched. _Float128 files use no long double, thus are always safe to use this option. Similarly, when investigating the linker complaints, difftime makes trivial, self contained, usage of long double, so thus it is also explicitly marked as such. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Makeconfig: sandwich gnulib-tests between libc/ld linking of testsPaul E. Murphy2020-03-251-17/+1
| | | | | | | This better resembles the default linking process with the gnulibs, and also resolves the increasingly difficult to maintain f128-loader-link usage on powerpc64le as some libgcc symbols are dependent on those found in the loader (ld).
* powerpc64le: Ensure correct ldouble compiler flags are usedGabriel F. T. Gomes2020-03-251-0/+58
| | | | | | | | | | | | | Ensure the correct ldouble abi flags are applied to ibm128 files and nldbl files. Remove the IEEE options if used, and apply the flags used to build ldouble files which are ibm128 abi. nldbl tests are a little tricky. To use the support, we must remove all ldouble abi flags, and ensure -mlong-double-64 is used. Co-authored-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com> Co-authored-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
* math: Remove inline math testsAdhemerval Zanella2020-03-191-3/+1
| | | | | | | | | | | | | With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.
* Fix array overflow in backtrace on PowerPC (bug 25423)Andreas Schwab2020-01-211-0/+2
| | | | | | When unwinding through a signal frame the backtrace function on PowerPC didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").
* powerpc: Move cache line size to rtld_global_roTulio Magno Quites Machado Filho2020-01-173-9/+39
| | | | | | | | | | | | | | GCC 10.0 enabled -fno-common by default and this started to point that __cache_line_size had been implemented in 2 different places: loader and libc. In order to avoid this duplication, the libc variable has been removed and the loader variable is moved to rtld_global_ro. File sysdeps/unix/sysv/linux/powerpc/dl-auxv.h has been added in order to reuse code for both static and dynamic linking scenarios. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Move vDSO setup to rtld (BZ#24967)Adhemerval Zanella2020-01-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | This patch moves the vDSO setup from libc to loader code, just after the vDSO link_map setup. For static case the initialization is moved to _dl_non_dynamic_init instead. Instead of using the mangled pointer, the vDSO data is set as attribute_relro (on _rtld_global_ro for shared or _dl_vdso_* for static). It is read-only even with partial relro. It fixes BZ#24967 now that the vDSO pointer is setup earlier than malloc interposition is called. Also, vDSO calls should not be a problem for static dlopen as indicated by BZ#20802. The vDSO pointer would be zero-initialized and the syscall will be issued instead. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu, s390x-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu. I also run some tests on mips. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Add libm_alias_finite for _finite symbolsWilco Dijkstra2020-01-032-2/+5
| | | | | | | | | | | | | | | | | | | This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-01248-248/+248
|
* Expand $(as-needed) and $(no-as-needed) throughout the build systemFlorian Weimer2019-12-031-1/+1
| | | | | | | | | Since commit a3cc4f48e94f32c9532ee36982ac00eb1e5719b0 ("Remove --as-needed configure test."), --as-needed support is no longer optional. The macros are not much shorter and do not provide documentary value, either, so this commit removes them.
* Refactor vDSO initialization codeAdhemerval Zanella2019-09-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux vDSO initialization code the internal function pointers require a lot of duplicated boilerplate over different architectures. This patch aims to simplify not only the code but the required definition to enable a vDSO symbol. The changes are: 1. Consolidate all init-first.c on only one implementation and enable the symbol based on HAVE_*_VSYSCALL existence. 2. Set the HAVE_*_VSYSCALL to the architecture expected names string. 3. Add a new internal implementation, get_vdso_mangle_symbol, which returns a mangled function pointer. Currently the clock_gettime, clock_getres, gettimeofday, getcpu, and time are handled in an arch-independent way, powerpc still uses some arch-specific vDSO symbol handled in a specific init-first implementation. Checked on aarch64-linux-gnu, arm-linux-gnueabihf, i386-linux-gnu, mips64-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu, sparc64-linux-gnu, and x86_64-linux-gnu. * sysdeps/powerpc/powerpc32/backtrace.c (is_sigtramp_address, is_sigtramp_address_rt): Use HAVE_SIGTRAMP_{RT}32 instead of SHARED. * sysdeps/powerpc/powerpc64/backtrace.c (is_sigtramp_address): Likewise. * sysdeps/unix/sysv/linux/aarch64/init-first.c: Remove file. * sysdeps/unix/sysv/linux/aarch64/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/arm/init-first.c: Likewise. * sysdeps/unix/sysv/linux/arm/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/mips/init-first.c: Likewise. * sysdeps/unix/sysv/linux/mips/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/i386/init-first.c: Likewise. * sysdeps/unix/sysv/linux/riscv/init-first.c: Likewise. * sysdeps/unix/sysv/linux/riscv/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/s390/init-first.c: Likewise. * sysdeps/unix/sysv/linux/s390/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/sparc/init-first.c: Likewise. * sysdeps/unix/sysv/linux/sparc/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/x86/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/x86_64/init-first.c: Likewise. * sysdeps/unix/sysv/linux/aarch64/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL): Define value based on kernel exported name. * sysdeps/unix/sysv/linux/arm/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/i386/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/mips/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/powerpc/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETCPU_VSYSCALL, HAVE_TIME_VSYSCALL, HAVE_GET_TBFREQ, HAVE_SIGTRAMP_RT64, HAVE_SIGTRAMP_32, HAVE_SIGTRAMP_RT32i, HAVE_GETTIMEOFDAY_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/riscv/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL, HAVE_GETCPU_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/s390/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL, HAVE_GETCPU_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/sparc/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/x86_64/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL, HAVE_GETCPU_VSYSCALL): Likewise. * sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME, VDSO_HASH): Define to invalid names if architecture does not define them. (get_vdso_mangle_symbol): New symbol. * sysdeps/unix/sysv/linux/init-first.c: New file. * sysdeps/unix/sysv/linux/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/init-first.c (gettimeofday, clock_gettime, clock_getres, getcpu, time): Remove declaration. (__libc_vdso_platform_setup_arch): Likewise and use get_vdso_mangle_symbol to setup vDSO symbols. (sigtramp_rt64, sigtramp32, sigtramp_rt32, get_tbfreq): Add attribute_hidden. * sysdeps/unix/sysv/linux/powerpc/libc-vdso.h: Likewise. * sysdeps/unix/sysv/linux/sysdep-vdso.h (VDSO_SYMBOL): Remove definition.
* Fix three GNU license URLs, along with trailing-newline issues.Paul Eggert2019-09-071-2/+1
|
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-07247-247/+247
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* powerpc: Cleanup: use actual power8 assembly mnemonicsRaoni Fassina Firmino2019-08-0111-187/+87
| | | | | | | | | | | | | | | | | | | | | | Some implementations in sysdeps/powerpc/powerpc64/power8/*.S still had pre power8 compatible binutils hardcoded macros and were not using .machine power8. This patch should not have semantic changes, in fact it should have the same exact code generated. Tested that generated stripped shared objects are identical when using "strip --remove-section=.note.gnu.build-id". Checked on: - powerpc64le, power9, build-many-glibcs.py, gcc 6.4.1 20180104, binutils 2.26.2.20160726 - powerpc64le, power8, debian 9, gcc 6.3.0 20170516, binutils 2.28 - powerpc64le, power9, ubuntu 19.04, gcc 8.3.0, binutils 2.32 - powerpc64le, power9, opensuse tumbleweed, gcc 9.1.1 20190527, binutils 2.32 - powerpc64, power9, debian 10, gcc 8.3.0, binutils 2.31.1 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: refactor logb{f,l}Adhemerval Zanella2019-07-0814-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The power7 logb implementation does not show a performance gain on ISA 2.07+ chips with faster floating-point to GRP instructions (currently POWER8 and POWER9). This patch moves the POWER7 implementation to generic one and enables it for POWER7. It also add some cleanup to use inline floating-point number instead of define them using static const. The performance difference is for POWER9: - Without patch: "logb": { "subnormal": { "duration": 4.99202e+09, "iterations": 8.83662e+08, "max": 75.194, "min": 5.501, "mean": 5.64925 }, "normal": { "duration": 4.97063e+09, "iterations": 9.97094e+08, "max": 46.489, "min": 4.956, "mean": 4.98512 } } - With patch: "logb": { "subnormal": { "duration": 4.97226e+09, "iterations": 9.92036e+08, "max": 77.209, "min": 4.892, "mean": 5.01218 }, "normal": { "duration": 4.96192e+09, "iterations": 1.07545e+09, "max": 12.361, "min": 4.593, "mean": 4.61382 } } The ifunc implementation is also enabled only for powerpc64. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power7/fpu/s_logb.c: Move to ... * sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ... * sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants. * sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ... * sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c: Adjust implementation path. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_log* objects. (CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c, CFLAGS-s_logb-power7.c): New fule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Refactor modf{f}Adhemerval Zanella2019-07-088-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The modf{f} optimization is not an optimization for ISA 2.07+. This patch move the IFUNC for powerpc64 only, move the power5+ to generic location, and include the generic implementation for ISA 2.07+. The performance changes are based on modf benchtests: * POWER9 - ppc64 "modf": { "": { "duration": 4.97057e+09, "iterations": 1.00688e+09, "max": 28.76, "min": 4.912, "mean": 4.9366 } } * POWER9 - power5+ "modf": { "": { "duration": 4.98291e+09, "iterations": 9.32818e+08, "max": 15.058, "min": 5.107, "mean": 5.34178 } } * POWER8 - ppc64 "modf": { "": { "duration": 5.05329e+09, "iterations": 8.38814e+08, "max": 518.051, "min": 5.79, "mean": 6.02433 } } * POWER8 - power5+ "modf": { "": { "duration": 5.05573e+09, "iterations": 8.35254e+08, "max": 63.141, "min": 5.873, "mean": 6.05293 } } * POWER7 - ppc64 "modf": { "": { "duration": 4.89818e+09, "iterations": 1.08408e+09, "max": 57.556, "min": 3.953, "mean": 4.51827 } } * POWER7 - power5+ "modf": { "": { "duration": 4.83789e+09, "iterations": 1.33409e+09, "max": 46.608, "min": 2.224, "mean": 3.62636 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ... * sysdeps/powerpc/fpu/s_modf.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ... * sysdeps/powerpc/fpu/s_modff.c: ... here. Add ISA 2.07 optimization. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c: Adjust include. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls, sysdep_routines): Add s_modf* objects. (CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c, CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: hypot refactor and optimizationAdhemerval Zanella2019-07-087-160/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc hypot is slight optimized by: - Commit 8df4e219e43, both isnan and isinf are always inlined and thus the check TEST_INF_NAN does not make sense anymore. The generic check for POWER7 should be faster on all powerpc configuration. - The redundant check 'y > two60factor && (x / y) > two60' is removed. Both changes leads to unrequired ifunc especialization for power7 and thus they are removed. Finally The code is also cleanup a bit by inlining the constants floating points. The performance changes using the hypot benchtests are: - POWER9 without patch: "hypot": { "overflow": { "duration": 4.98585e+09, "iterations": 4.84932e+08, "max": 46.551, "min": 10.229, "mean": 10.2815 }, "higher_two500": { "duration": 5.00192e+09, "iterations": 4.24843e+08, "max": 33.319, "min": 11.606, "mean": 11.7736 }, "subnormal": { "duration": 5.0075e+09, "iterations": 4.06792e+08, "max": 22.178, "min": 12.15, "mean": 12.3097 }, "less_two500": { "duration": 5.00685e+09, "iterations": 4.08772e+08, "max": 22.784, "min": 12.052, "mean": 12.2485 }, "default": { "duration": 5.06002e+09, "iterations": 4.09894e+08, "max": 20.648, "min": 11.874, "mean": 12.3447 } } - POWER9 with patch: "hypot": { "overflow": { "duration": 4.91848e+09, "iterations": 7.28039e+08, "max": 47.958, "min": 6.436, "mean": 6.75579 }, "higher_two500": { "duration": 4.9359e+09, "iterations": 6.63376e+08, "max": 20.783, "min": 7.321, "mean": 7.44057 }, "subnormal": { "duration": 4.9479e+09, "iterations": 6.19772e+08, "max": 18.856, "min": 7.817, "mean": 7.98341 }, "less_two500": { "duration": 4.94275e+09, "iterations": 6.3889e+08, "max": 17.452, "min": 7.597, "mean": 7.73647 }, "default": { "duration": 5.03645e+09, "iterations": 5.70718e+08, "max": 18.904, "min": 8.55, "mean": 8.82476 } } - POWER7 without patch "hypot": { "overflow": { "duration": 4.86637e+09, "iterations": 6.43196e+08, "max": 53.958, "min": 7.328, "mean": 7.56592 }, "higher_two500": { "duration": 4.99842e+09, "iterations": 3.11012e+08, "max": 78.227, "min": 15.696, "mean": 16.0715 }, "subnormal": { "duration": 4.99841e+09, "iterations": 3.08935e+08, "max": 51.392, "min": 15.983, "mean": 16.1795 }, "less_two500": { "duration": 5.00108e+09, "iterations": 2.99464e+08, "max": 73.247, "min": 16.416, "mean": 16.7001 }, "default": { "duration": 5.04645e+09, "iterations": 3.52608e+08, "max": 70.073, "min": 13.38, "mean": 14.3118 } } - POWER7 with patch "hypot": { "overflow": { "duration": 4.80785e+09, "iterations": 8.00001e+08, "max": 66.262, "min": 5.888, "mean": 6.00981 }, "higher_two500": { "duration": 4.9859e+09, "iterations": 3.39449e+08, "max": 5148.44, "min": 14.539, "mean": 14.6882 }, "subnormal": { "duration": 4.9905e+09, "iterations": 3.28874e+08, "max": 64.905, "min": 14.971, "mean": 15.1745 }, "less_two500": { "duration": 4.99494e+09, "iterations": 3.19755e+08, "max": 103.696, "min": 14.972, "mean": 15.6211 }, "default": { "duration": 5.03951e+09, "iterations": 4.02502e+08, "max": 61.008, "min": 12.368, "mean": 12.5205 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022, twoM500, twoM600, two60factor, pdnum): Remove. (TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro. (__ieee754_hypot): Replace static variables with inline definition, remove ununsed branches. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_hypot-* objects. (CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Use generic e_expfAdhemerval Zanella2019-06-267-383/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generic implementation is faster on both power8 and power9: POWER9: - sysdeps/ieee754/flt-32/e_expf.c "expf": { "workload-spec2017.wrf": { "duration": 5.1236e+09, "iterations": 7.53344e+08, "reciprocal-throughput": 5.9436, "latency": 7.65869, "max-throughput": 1.68248e+08, "min-throughput": 1.30571e+08 } } - sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S "expf": { "workload-spec2017.wrf": { "duration": 5.14429e+09, "iterations": 5.29248e+08, "reciprocal-throughput": 8.05372, "latency": 11.3863, "max-throughput": 1.24166e+08, "min-throughput": 8.78249e+07 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* powerpc: Refactor powerpc64 lround/lroundf/llround/llroundfAdhemerval Zanella2019-06-1730-490/+190
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc {l}lround{f} implementations on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llround-power8, s_llround-power6x, s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64. (CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c, CFLAGS-s_llround-power5+.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/Makefile [$(subdir) == math] (CFLAGS-s_llround.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llround-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintfAdhemerval Zanella2019-06-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_lrintf.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ... * sysdeps/powerpc/fpu/s_lrintf.c: ... here. * sysdeps/powerpc/powerpc32/fpu/Makefile [$(subdir) == math] (CFLAGS-s_lrint.c): New rule. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4 optimization. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c, CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c, CFLAGS-s_lrint-ppc32.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintfAdhemerval Zanella2019-06-1721-225/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/fpu/s_llrint{f}. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and s_llrint-ppc64. (CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llrint-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* powerpc: Remove optimized finiteAdhemerval Zanella2019-06-1211-367/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc finite optimization do not show much gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc64-linux-gnu-power7 (--with-cpu=power7): - Generic sysdeps/ieee754 implementation: "isfinite": { "": { "duration": 5.0082e+09, "iterations": 2.45299e+09, "max": 43.824, "min": 2.008, "mean": 2.04167 }, "INF": { "duration": 4.66554e+09, "iterations": 2.28288e+09, "max": 35.73, "min": 2.008, "mean": 2.04371 }, "NAN": { "duration": 4.66274e+09, "iterations": 2.28716e+09, "max": 34.161, "min": 2.009, "mean": 2.03866 } } - power7 optimized one: "isfinite": { "": { "duration": 4.99111e+09, "iterations": 2.65566e+09, "max": 25.015, "min": 1.716, "mean": 1.87942 }, "INF": { "duration": 4.6783e+09, "iterations": 2.0999e+09, "max": 35.264, "min": 1.868, "mean": 2.22787 }, "NAN": { "duration": 4.67915e+09, "iterations": 2.08678e+09, "max": 38.099, "min": 1.869, "mean": 2.24228 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_finite* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_finite* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>