about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64/multiarch
Commit message (Collapse)AuthorAgeFilesLines
* Add bounds check to __libc_ifunc_impl_listWilco Dijkstra2022-06-101-7/+2
| | | | | | | | | | | | Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC redundant and fixes several targets that will write outside the array. To avoid unnecessary large diffs, pass the maximum in the argument 'i' to IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing ones can be updated if desired. Passes buildmanyglibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* powerpc: Remove powerpc64 bzero optimizationsAdhemerval Zanella2022-02-238-104/+1
| | | | | The symbol is not present in current POSIX specification and compiler already generates memset call.
* powerpc: Remove bcopy optimizationsAdhemerval Zanella2022-02-236-85/+1
| | | | | The symbols is not present in current POSIX specification and compiler already generates memmove call.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-01134-134/+134
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* String: Add hidden defs for __memcmpeq() to enable internal usageNoah Goldstein2021-10-264-0/+8
| | | | | | | | No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.
* String: Add support for __memcmpeq() ABI on all targetsNoah Goldstein2021-10-265-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.
* powerpc64: Add checks for Altivec and VSX in ifunc selectionAnton Blanchard2021-08-0626-68/+139
| | | | | | | We'd like to support processors without Altivec or VSX, so check the relevant hwcap bits before selecting them. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Check cacheline size before using optimised memset routinesAnton Blanchard2021-08-062-10/+23
| | | | | | | A number of optimised memset routines assume the cacheline size is 128B, so we better check before using them. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Replace some PPC_FEATURE_HAS_VSX with PPC_FEATURE_ARCH_2_06Anton Blanchard2021-08-0620-38/+38
| | | | | | | | We use PPC_FEATURE_HAS_VSX to select a number of POWER7 optimised functions. These functions don't use any VSX instructions, so PPC_FEATURE_ARCH_2_06 seems like a better fit. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64: Remove strcspn ifunc from the loaderTulio Magno Quites Machado Filho2021-07-081-0/+18
| | | | | | | | | | 5 years ago, commit 8f1b841e452dbb083112fd036033b7f4af506ba0 unintentionally added an ifunc to the loader. That modification has not caused any harm so far, but it doesn't add any value either, because the hwcap information is available later during libc initialization. Suggested-by: Anton Blanchard <anton@ozlabs.org>
* powerpc: Optimized memcmp for power10Lucas A. M. Magalhaes2021-05-314-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch was based on the __memcmp_power8 and the recent __strlen_power10. Improvements from __memcmp_power8: 1. Don't need alignment code. On POWER10 lxvp and lxvl do not generate alignment interrupts, so they are safe for use on caching-inhibited memory. Notice that the comparison on the main loop will wait for both VSR to be ready. Therefore aligning one of the input address does not improve performance. In order to align both registers a vperm is necessary which add too much overhead. 2. Uses new POWER10 instructions This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. The vextractbm is used to have a smaller tail code for calculating the return value. 3. Performance improvement This version has around 35% better performance on average. I saw no performance regressions for any length or alignment. Thanks Matheus for helping me out with some details. Co-authored-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
* powerpc: Add optimized rawmemchr for POWER10Matheus Castanho2021-05-174-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reuse code for optimized strlen to implement a faster version of rawmemchr. This takes advantage of the same benefits provided by the strlen implementation, but needs some extra steps. __strlen_power10 code should be unchanged after this change. rawmemchr returns a pointer to the char found, while strlen returns only the length, so we have to take that into account when preparing the return value. To quickly check 64B, the loop on __strlen_power10 merges the whole block into 16B by using unsigned minimum vector operations (vminub) and checks if there are any \0 on the resulting vector. The same code is used by rawmemchr if the char c is 0. However, this approach does not work when c != 0. We first need to subtract each byte by c, so that the value we are looking for is converted to a 0, then taking the minimum and checking for nulls works again. The new code branches after it has compared ~256 bytes and chooses which of the two strategies above will be used in the main loop, based on the char c. This extra branch adds some overhead (~5%) for length ~256, but is quickly amortized by the faster loop for larger sizes. Compared to __rawmemchr_power9, this version is ~20% faster for length < 256. Because of the optimized main loop, the improvement becomes ~35% for c != 0 and ~50% for c = 0 for strings longer than 256. Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
* powerpc64le: Fix ifunc selection for memset, memmove, bzero and bcopyRaoni Fassina Firmino2021-05-075-20/+22
| | | | | | | | | The hwcap2 check for the aforementioned functions should check for both PPC_FEATURE2_ARCH_3_1 and PPC_FEATURE2_HAS_ISEL but was mistakenly checking for any one of them, enabling isa 3.1 version of the functions in incompatible processors, like POWER8. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: Optimize memset for POWER10Raoni Fassina Firmino2021-04-305-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implementation is based on __memset_power8 and integrates a lot of suggestions from Anton Blanchard. The biggest difference is that it makes extensive use of stxvl to alignment and tail code to avoid branches and small stores. It has three main execution paths: a) "Short lengths" for lengths up to 64 bytes, avoiding as many branches as possible. b) "General case" for larger lengths, it has an alignment section using stxvl to avoid branches, a 128 bytes loop and then a tail code, again using stxvl with few branches. c) "Zeroing cache blocks" for lengths from 256 bytes upwards and set value being zero. It is mostly the __memset_power8 code but the alignment phase was simplified because, at this point, address is already 16-bytes aligned and also changed to use vector stores. The tail code was also simplified to reuse the general case tail. All unaligned stores use stxvl instructions that do not generate alignment interrupts on POWER10, making it safe to use on caching-inhibited memory. On average, this implementation provides something around 30% improvement when compared to __memset_power8. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc64le: Optimize memcpy for POWER10Tulio Magno Quites Machado Filho2021-04-304-1/+40
| | | | | | | | | | | | | | | | | | This implementation is based on __memcpy_power8_cached and integrates suggestions from Anton Blanchard. It benefits from loads and stores with length for short lengths and for tail code, simplifying the code. All unaligned memory accesses use instructions that do not generate alignment interrupts on POWER10, making it safe to use on caching-inhibited memory. The main loop has also been modified in order to increase instruction throughput by reducing the dependency on updates from previous iterations. On average, this implementation provides around 30% improvement when compared to __memcpy_power7 and 10% improvement in comparison to __memcpy_power8_cached.
* powerpc64le: Optimized memmove for POWER10Lucas A. M. Magalhaes2021-04-306-7/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch was initially based on the __memmove_power7 with some ideas from strncpy implementation for Power 9. Improvements from __memmove_power7: 1. Use lxvl/stxvl for alignment code. The code for Power 7 uses branches when the input is not naturally aligned to the width of a vector. The new implementation uses lxvl/stxvl instead which reduces pressure on GPRs. It also allows the removal of branch instructions, implicitly removing branch stalls and mispredictions. 2. Use of lxv/stxv and lxvl/stxvl pair is safe to use on Cache Inhibited memory. On Power 10 vector load and stores are safe to use on CI memory for addresses unaligned to 16B. This code takes advantage of this to do unaligned loads. The unaligned loads don't have a significant performance impact by themselves. However doing so decreases register pressure on GPRs and interdependence stalls on load/store pairs. This also improved readability as there are now less code paths for different alignments. Finally this reduces the overall code size. 3. Improved performance. This version runs on average about 30% better than memmove_power7 for lengths larger than 8KB. For input lengths shorter than 8KB the improvement is smaller, it has on average about 17% better performance. This version has a degradation of about 50% for input lengths in the 0 to 31 bytes range when dest is unaligned. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Add optimized strlen for POWER10Matheus Castanho2021-04-224-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improvements compared to POWER9 version: 1. Take into account first 16B comparison for aligned strings The previous version compares the first 16B and increments r4 by the number of bytes until the address is 16B-aligned, then starts doing aligned loads at that address. For aligned strings, this causes the first 16B to be compared twice, because the increment is 0. Here we calculate the next 16B-aligned address differently, which avoids that issue. 2. Use simple comparisons for the first ~192 bytes The main loop is good for big strings, but comparing 16B each time is better for smaller strings. So after aligning the address to 16 Bytes, we check more 176B in 16B chunks. There may be some overlaps with the main loop for unaligned strings, but we avoid using the more aggressive strategy too soon, and also allow the loop to start at a 64B-aligned address. This greatly benefits smaller strings and avoids overlapping checks if the string is already aligned at a 64B boundary. 3. Reduce dependencies between load blocks caused by address calculation on loop Doing a precise time tracing on the code showed many loads in the loop were stalled waiting for updates to r4 from previous code blocks. This implementation avoids that as much as possible by using 2 registers (r4 and r5) to hold addresses to be used by different parts of the code. Also, the previous code aligned the address to 16B, then to 64B by doing a few 48B loops (if needed) until the address was aligned. The main loop could not start until that 48B loop had finished and r4 was updated with the current address. Here we calculate the address used by the loop very early, so it can start sooner. The main loop now uses 2 pointers 128B apart to make pointer updates less frequent, and also unrolls 1 iteration to guarantee there is enough time between iterations to update the pointers, reducing stalled cycles. 4. Use new P10 instructions lxvp is used to load 32B with a single instruction, reducing contention in the load queue. vextractbm allows simplifying the tail code for the loop, replacing vbpermq and avoiding having to generate a permute control vector. Reviewed-by: Paul E Murphy <murphyp@linux.ibm.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com> Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-02128-128/+128
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* powerpc: Add optimized stpncpy for POWER9Raphael M Zinsly2020-11-124-1/+44
| | | | | | | Add stpncpy support into the POWER9 strncpy. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: Add optimized strncpy for POWER9Raphael M Zinsly2020-11-124-1/+47
| | | | | | | | Similar to the strcpy P9 optimization, this version uses VSX to improve performance. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* powerpc: fix ifunc implementation list for POWER9 strlen and stpcpyRaphael Moreira Zinsly2020-09-171-2/+2
| | | | | __strlen_power9 and __stpcpy_power9 were added to their ifunc lists using the wrong function names.
* powerpc64le: add optimized strlen for P9Paul E. Murphy2020-06-054-1/+12
| | | | | | | | | | | | | | | | This started as a trivial change to Anton's rawmemchr. I got carried away. This is a hybrid between P8's asympotically faster 64B checks with extremely efficient small string checks e.g <64B (and sometimes a little bit more depending on alignment). The second trick is to align to 64B by running a 48B checking loop 16B at a time until we naturally align to 64B (i.e checking 48/96/144 bytes/iteration based on the alignment after the first 5 comparisons). This allieviates the need to check page boundaries. Finally, explicly use the P7 strlen with the runtime loader when building P9. We need to be cautious about vector/vsx extensions here on P9 only builds.
* powerpc: Optimized rawmemchr for POWER9Anton Blanchard2020-05-184-3/+38
| | | | | | | | | | | This version uses vector instructions and is up to 60% faster on medium matches and up to 90% faster on long matches, compared to the POWER7 version. A few examples: __rawmemchr_power9 __rawmemchr_power7 Length 32, alignment 0: 2.27566 3.77765 Length 64, alignment 2: 2.46231 3.51064 Length 1024, alignment 0: 17.3059 32.6678
* powerpc: Optimized stpcpy for POWER9Anton Blanchard via Libc-alpha2020-05-184-6/+41
| | | | | | | | | | Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on small strings and up to 90% faster on long relatively unaligned strings, compared to the POWER8 version. A few examples: __stpcpy_power9 __stpcpy_power8 Length 20, alignments in bytes 4/ 4: 2.58246 4.8788 Length 1024, alignments in bytes 1/ 6: 24.8186 47.8528
* powerpc: Optimized strcpy for POWER9Anton Blanchard via Libc-alpha2020-05-184-1/+38
| | | | | | | | | | This version uses VSX store vector with length instructions and is significantly faster on small strings and relatively unaligned large strings, compared to the POWER8 version. A few examples: __strcpy_power9 __strcpy_power8 Length 16, alignments in bytes 0/ 0: 2.52454 4.62695 Length 412, alignments in bytes 4/ 0: 11.6 22.9185
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-01123-123/+123
|
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-07123-123/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* powerpc: Use generic wcsrchr optimizationAdhemerval Zanella2019-04-046-110/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcsrchr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcsrchr.c): New rule. * sysdeps/powerpc/power6/wcsrchr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcsrchr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcsrchr-power6 and wcsrchr-power7. (CFLAGS-wcsrchr-power7.c, CFLAGS-wcsrchr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcsrchr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* powerpc: Use generic wcschr optimizationAdhemerval Zanella2019-04-046-114/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcschr optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcschr.c): New rule. * sysdeps/powerpc/power6/wcschr.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcschr.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcschr-power6 and wcschr-power7. (CFLAGS-wcschr-power7.c, CFLAGS-wcschr-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcschr optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* powerpc: Use generic wcscpy optimizationAdhemerval Zanella2019-04-046-106/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the power6 wcscpy optimization and uses generic implementation instead. Currently, both power6 and power7 IFUNC variant resulting binary are essentially the same and the generic implementation with unrolling loop set to 8 also results in similar performance. Checked on powerpc64-linux-gnu. * sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcscpy.c): New rule. * sysdeps/powerpc/power6/wcscpy.c: Remove file. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/Makefile [$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and wcscpy-power7. (CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule. * sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise. * sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c: Remove wcscpy optimizations. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
* wcsmbs: optimize wcscatAdhemerval Zanella2019-02-271-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | This patch rewrites wcscat using wcslen and wcscpy. This is similar to the optimization done on strcat by 6e46de42fe. The strcpy changes are mainly to add the internal alias to avoid PLT calls. Checked on x86_64-linux-gnu and a build against the affected architectures. * include/wchar.h (__wcscpy): New prototype. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c (__wcscpy): Route internal symbol to generic implementation. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy): Add internal __wcscpy alias. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise. * sysdeps/s390/wcscpy.c (wcscpy): Likewise. * sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise. * wcsmbs/wcscpy.c (wcscpy): Add * sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to use generic implementation. * wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
* powerpc: Fix tiny bug in strncmp.cPaul Clarke2019-01-161-1/+1
| | | | | | | | | | | A single underscore was omitted in sysdeps/powerpc/powerpc64/multiarch/strncmp.c, resulting in use of power8 version of strncmp instead of power9 version, with significant performance degradation. * sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Fix #ifdef. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-01135-135/+135
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* powerpc: Rearrange little endian specific filesRajalakshmi Srinivasaraghavan2018-08-166-6/+21
| | | | | | This patch moves little endian specific POWER9 optimization files to sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions only for little endian.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-01135-135/+135
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* powerpc: POWER8 memcpy optimization for cached memoryAdhemerval Zanella2017-12-114-12/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On POWER8, unaligned memory accesses to cached memory has little impact on performance as opposed to its ancestors. It is disabled by default and will only be available when the tunable glibc.tune.cached_memopt is set to 1. __memcpy_power8_cached __memcpy_power7 ============================================================ max-size=4096: 33325.70 ( 12.65%) 38153.00 max-size=8192: 32878.20 ( 11.17%) 37012.30 max-size=16384: 33782.20 ( 11.61%) 38219.20 max-size=32768: 33296.20 ( 11.30%) 37538.30 max-size=65536: 33765.60 ( 10.53%) 37738.40 * manual/tunables.texi (Hardware Capability Tunables): Document glibc.tune.cached_memopt. * sysdeps/powerpc/cpu-features.c: New file. * sysdeps/powerpc/cpu-features.h: New file. * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add _dl_powerpc_cpu_features. * sysdeps/powerpc/dl-tunables.list: New file. * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED && IS_IN(rtld))]: Restrict dl_platform_init availability and initialize CPU features used by tunables. * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add memcpy-power8-cached. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add __memcpy_power8_cached. * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S: New file. Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
* powerpc: Use latest optimization for internal function callsRajalakshmi Srinivasaraghavan2017-11-071-1/+1
| | | | | | | Update strcasestr-power8 to use power8 version of strnlen for calculating length. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
* [PowerPC64] sysdep.h doesn't need to be included in multiarch filesAlan Modra2017-10-3155-110/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the .c/.S file neither uses nor modifies macros defined in sysdep.h there is no point to #include it. The same goes for math_ldbl_opt.h except that it includes shlib-compat.h, and if compat_symbol is redefined we need to include shlib-compat.h first. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Don't include sysdep.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-a2.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-cell.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power6.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/mempcpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memrchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memrchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power6.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memset-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/rawmemchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpcpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcasestr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchr-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchrnul-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strchrnul-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strcspn-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncase-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power4.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strnlen-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strnlen-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strspn-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S: Don't include sysdep.h and math_ldbl_opt.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Don't include sysdep.h and math_ldbl_opt.h. Include shlib-compat.h. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
* [PowerPC64] strncase_l-power7.c should use strncase_l.cAlan Modra2017-10-311-2/+4
| | | | | | | | | | This is another one where we'll be wanting the base symbols for powerpc64le rather than just a power7 variant. * sysdeps/powerpc/powerpc64/multiarch/strncase_l-power7.c: Include string/strncase_l.c, not string/strncase.c. (USE_IN_EXTENDED_LOCALE_MODEL): Don't define. (libc_hidden_def): Redefine.
* [PowerPC64] Tidy strcasecmp_l-power7.S symbolsAlan Modra2017-10-311-1/+3
| | | | | | | | | | | The routine being assembled here is strcasecmp_l, so ask for that via __STRCMP and STRCMP defines. That change means tweaking the power7 override. Needed for later powerpc64le changes where we want the base symbols, not just a power7 variant. * sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S: (__STRCMP, STRCMP, __strcasecmp_l): Define. (__strcasecmp): Don't define.
* [PowerPC64] Wrap str{,n}cmp-power{8,9}.S in IS_IN(libc)Alan Modra2017-10-314-0/+8
| | | | | | | | | | | These functions aren't used in ld.so at the moment since we don't have strcmp or strncmp ifuncs for them there. Remove the ld.so bloat. * sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Wrap in IS_IN (libc). * sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.
* [PowerPC64] Remove duplicate define in stpncpy-power8.SAlan Modra2017-10-311-2/+0
| | | | | | | | USE_AS_STPNCPY is defined by sysdeps/powerpc/powerpc64/power8/stpncpy.S, included by this file. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Don't define USE_AS_STPNCPY.
* powerpc: Fix IFUNC for memrchrRajalakshmi Srinivasaraghavan2017-10-062-17/+7
| | | | | | | | | | | | | | | Recent commit 59ba2d2b5421 missed to add __memrchr_power8 in ifunc list. Also handled discarding unwanted bytes for unaligned inputs in power8 optimization. 2017-10-05 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c: Revert back to powerpc32 file. * sysdeps/powerpc/powerpc64/multiarch/memrchr.c (memrchr): Add __memrchr_power8 to ifunc list. * sysdeps/powerpc/powerpc64/power8/memrchr.S: Mask extra bytes for unaligned inputs.
* powerpc: Optimize memrchr for power8Rajalakshmi Srinivasaraghavan2017-10-024-3/+47
| | | | | | | Vectorized loops are used for sizes greater than 32B to improve performance over power7 optimization. This shows as an average of 25% improvement depending on the position of search character. The performance is same for shorter strings.
* powerpc: refactor strrchr IFUNCRajalakshmi Srinivasaraghavan2017-06-231-17/+1
| | | | | | As done in commit 6d15a5c2e9450a1e926d5b4991759e1cfa50fccf clean up IFUNC implementation for power8 in order to remove unneeded macro definitions.
* powerpc: Optimize memchr for power8Rajalakshmi Srinivasaraghavan2017-06-214-1/+36
| | | | | Vectorized loops are used for sizes greater than 32B to improve performance over power7 optimiztion.
* Remove bits/string.h.Zack Weinberg2017-06-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These machine-dependent inline string functions have never been on by default, and even if they were a good idea at the time they were introduced, they haven't really been touched in ten to fifteen years and probably aren't a good idea on current-gen processors. Current thinking is that this class of optimization is best left to the compiler. * bits/string.h, string/bits/string.h * sysdeps/aarch64/bits/string.h * sysdeps/m68k/m680x0/m68020/bits/string.h * sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h * sysdeps/x86/bits/string.h: Delete file. * string/string.h: Don't include bits/string.h. * string/bits/string3.h: Rename to bits/string_fortified.h. No need to undef various symbols that the removed headers might have defined as macros. * string/Makefile (headers): Remove bits/string.h, change bits/string3.h to bits/string_fortified.h. * string/string-inlines.c: Update commentary. Remove definitions of various macros that nothing looks at anymore. Don't directly include bits/string.h. Set _STRING_INLINE_unaligned here, based on compiler-predefined macros. * string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY _is_ defined, provide internal hidden alias __strncat. * include/string.h: Declare internal hidden alias __strncat. Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is not defined. * include/bits/string3.h: Rename to bits/string_fortified.h, update to match above. * sysdeps/i386/string-inlines.c: Define compat symbols for everything formerly defined by sysdeps/x86/bits/string.h. Make existing definitions into compat symbols as well. Remove some no-longer-necessary messing around with macros. * sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/stpcpy.c * sysdeps/s390/multiarch/mempcpy.c No need to define _HAVE_STRING_ARCH_mempcpy. Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * sysdeps/i386/i686/multiarch/strncat-c.c * sysdeps/s390/multiarch/strncat-c.c * sysdeps/x86_64/multiarch/strncat-c.c Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
* PowerPC64 ENTRY_TOCLESSAlan Modra2017-06-141-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of functions in the sysdeps/powerpc/powerpc64/ tree don't use or change r2, yet declare a global entry that sets up r2. This patch fixes that problem, and consolidates the ENTRY and EALIGN macros. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (NOPS, ENTRY_3): New macros. (ENTRY): Rewrite. (ENTRY_TOCLESS): Define. (EALIGN, EALIGN_W_0, EALIGN_W_1, EALIGN_W_2, EALIGN_W_4, EALIGN_W_5, EALIGN_W_6, EALIGN_W_7, EALIGN_W_8): Delete. * sysdeps/powerpc/powerpc64/a2/memcpy.S: Replace EALIGN with ENTRY. * sysdeps/powerpc/powerpc64/dl-trampoline.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_cosf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_sinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcasestr.S: Likewise. * sysdeps/powerpc/powerpc64/addmul_1.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/powerpc64/cell/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_fabsl.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/lshift.S: Likewise. * sysdeps/powerpc/powerpc64/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/mul_1.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power4/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power6/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power7/add_n.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise. * sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcasecmp.S (strcasecmp_l): Likewise. * sysdeps/powerpc/powerpc64/power7/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strchrnul.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power7/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/memset.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strcpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strnlen.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strrchr.S: Likewise. * sysdeps/powerpc/powerpc64/power8/strspn.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/power9/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/strchr.S: Likewise. * sysdeps/powerpc/powerpc64/strcmp.S: Likewise. * sysdeps/powerpc/powerpc64/strlen.S: Likewise. * sysdeps/powerpc/powerpc64/strncmp.S: Likewise. * sysdeps/powerpc/powerpc64/ppc-mcount.S: Store LR earlier. Don't add nop when SHARED. * sysdeps/powerpc/powerpc64/start.S: Fix comment. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY): Don't define. (ENTRY_TOCLESS): Define. * sysdeps/powerpc/powerpc32/sysdep.h (ENTRY_TOCLESS): Define. * sysdeps/powerpc/fpu/s_fma.S: Use ENTRY_TOCLESS. * sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* PowerPC64 strncpy, stpncpy and strstr fixesAlan Modra2017-06-145-0/+19
| | | | | | | | | | | | | | | | | | | | | Makes __stpncpy_power8 call __memset_power8 directly rather than via an IFUNC. Fixes a missing _mcount, and removes some redundant NOPS. The *_is_local defines are also used in a followup patch. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Define MEMSET_is_local. * sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise. * sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise. Define MEMSET. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define STRLEN_is_local, STRNLEN_is_local, and STRCHR_is_local. * sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. Don't add nop after local calls. * sysdeps/powerpc/powerpc64/power7/strncpy.S: Define MEMSET_is_local. Don't add nop after local call. * sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. Add missing CALL_MCOUNT.
* PowerPC64 sysdep.h tidyAlan Modra2017-06-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | .align on some targets takes a byte alignment, on others like powerpc, log2 of the byte alignment. It's a good idea to avoid .align, particularly since x86 and powerpc are different. This patch fixes the occurrences of .align in powerpc64/sysdep.h, renames DOT_LABEL since the macro doesn't have anything to do with adding dots, removes extraneous semicolons, and fixes some formatting. * sysdeps/powerpc/powerpc64/sysdep.h: Formatting. (FUNC_LABEL): Rename from DOT_LABEL. (ENTRY_1): Use FUNC_LABEL and remove leading space from label. Use .p2align rather than .align. (TRACEBACK, TRACEBACK_MASK): Use .p2align rather than .align. (ABORT_TRANSACTION): Likewise. (ENTRY_1, ENTRY_2, END_2, LOCALENTRY): Remove unnecessary semicolons, particularly at end. Add semicolon at invocation as necessary. (TRACEBACK, TRACEBACK_MASK, PSEUDO, PSEUDO_NOERRNO): Likewise. (PSEUDO_ERRVAL, PPC64_LOAD_FUNCPTR, OPD_ENT): Likewise. * sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY, END): Adjust to suit.