about summary refs log tree commit diff
path: root/sysdeps/powerpc/powerpc64/multiarch/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* powerpc64: strchr/strchrnul optimization for power8Rajalakshmi Srinivasaraghavan2016-12-281-2/+2
| | | | | | | The P7 code is used for <=32B strings and for > 32B vectorized loops are used. This shows as an average 25% improvement depending on the position of search character. The performance is same for shorter strings. Tested on ppc64 and ppc64le.
* powerpc: strncmp optimization for power9Rajalakshmi Srinivasaraghavan2016-12-131-1/+2
| | | | | | | Vectorized loops are used for strings > 32B when compared to power8 optimization. Tested on power9 ppc64le simulator.
* powerpc: strcmp optimization for power9Rajalakshmi Srinivasaraghavan2016-12-011-1/+1
| | | | | | | Vectorized loops are used for strings > 32B when compared to power8 optimization. Tested on power9 ppc64le simulator.
* powerpc: strcasecmp/strncasecmp optmization for power8raji2016-06-141-1/+3
| | | | | | | This implementation utilizes vectors to improve performance compared to current byte by byte implementation for POWER7. The performance improvement is upto 4x. This patch is tested on powerpc64 and powerpc64le.
* powerpc: Add optimized strcspn for P8Paul E. Murphy2016-04-251-2/+2
| | | | | A few minor adjustments to the P8 strspn gives us an almost equally optimized P8 strcspn.
* powerpc: strcasestr optmization for power8Rajalakshmi Srinivasaraghavan2016-04-221-1/+2
| | | | | | This patch optimizes strcasestr function for power >= 8 systems. The average improvement of this optimization is ~40% and compares 16 bytes at a time using vector instructions. This patch is tested on powerpc64 and powerpc64le.
* powerpc: Optimization for strlen for POWER8.Carlos Eduardo Seo2016-04-151-1/+1
| | | | | This implementation takes advantage of vectorization to improve performance of the loop over the current strlen implementation for POWER7.
* powerpc: Add optimized P8 strspnPaul E. Murphy2016-04-071-1/+2
| | | | | | | This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.
* powerpc: strstr optimizationRajalakshmi Srinivasaraghavan2015-07-161-1/+1
| | | | | | | | | | | | | | | | This patch optimizes strstr function for power >= 7 systems. Performance gain is obtained using aligned memory access and usage of cmpb instruction for quicker comparison. The average improvement of this optimization is ~40%. Tested on ppc64 and ppc64le. 2015-07-16 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com> * sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strstr(). * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/powerpc/powerpc64/power7/strstr.S: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New File. * sysdeps/powerpc/powerpc64/multiarch/strstr.c: New File.
* powerpc: wordcopy/memmove cleanup for ppc64Adhemerval Zanella2015-02-091-4/+3
| | | | | | | | | | This patch cleanup some multiarch code related to memmmove optimization. Initial IFUNC support added specialized wordcopy symbols which turned in local IFUNC calls used by memmove default implementation. This change by removing then and used the optimized memmove instead for supported chips.
* powerpc: Remove POWER7 wordcopy ifuncAdhemerval Zanella2015-02-091-2/+1
| | | | | | This patch remove the POWER7 ifunc wordcopy function (_wordcopy_*_power7), since now GLIBC provides a optimized memmove/bcopy for POWER7.
* powerpc: multiarch Makefile cleanup for powerpc64Adhemerval Zanella2015-02-091-5/+10
| | | | | This patch cleanups the multiarch Makefile by putting the wide chars implementation to correct wcsmbs rule.
* powerpc: Optimized strncmp for POWER8/PPC64Adhemerval Zanella2015-01-131-2/+3
| | | | | | | | | | This patch adds an optimized POWER8 strncmp. The implementation focus on speeding up unaligned cases follwing the ideas of power8 strcmp. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases (where sources alignment are different).
* powerpc: Optimized strcmp for POWER8/PPC64Adhemerval Zanella2015-01-131-1/+1
| | | | | | | This patch adds an optimized POWER8 strcmp using unaligned accesses. The algorithm first check the initial 16 bytes, then align the first function source and uses unaligned loads on second argument only. Aditional checks for page boundaries are done for unaligned cases
* powerpc: Optimized st{r,p}ncpy for POWER8/PPC64Adhemerval Zanella2015-01-131-2/+3
| | | | | | | | | | | | | This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses. It shows 10%-80% improvement over the optimized POWER7 one that uses only aligned accesses, specially on unaligned inputs. The algorithm first read and check 16 bytes (if inputs do not cross a 4K page size). The it realign source to 16-bytes and issue a 16 bytes read and compare loop to speedup null byte checks for large strings. Also, different from POWER7 optimization, the null pad is done inline in the implementation using possible unaligned accesses, instead of realying on a memset call. Special case is added for page cross reads.
* powerpc: Optimized strcat for POWER8/PPC64Adhemerval Zanella2015-01-131-2/+2
| | | | | With new optimized strcpy for POWER8, this patch adds an optimized strcat which uses it along with default implementation at strings/.
* powerpc: Optimized st{r,p}cpy for POWER8/PPC64Adhemerval Zanella2015-01-131-1/+2
| | | | | | | | | | | | This patch adds an optimized POWER8 strcpy using unaligned accesses. For strings up to 16 bytes the implementation first calculate the string size, like strlen, and issues a memcpy. For larger strings, source is first aligned to 16 bytes and then tested over a loop that reads 16 bytes am combine the cmpb results for speedup. Special case is added for page cross reads. It shows 30%-60% improvement over the optimized POWER7 one that uses only aligned accesses.
* powerpc: Add powerpc64 strpbrk optimizationAdhemerval Zanella2014-12-021-1/+1
| | | | | | This patch makes the POWER7 optimized strpbrk generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 does not change.
* powerpc: Add powerpc64 strcspn optimizationAdhemerval Zanella2014-12-021-1/+0
| | | | | | This patch makes the POWER7 optimized strcspn generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 does not change.
* powerpc: Add powerpc64 strspn optimizationAdhemerval Zanella2014-12-021-1/+1
| | | | | | This patch makes the POWER7 optimized strspn generic by using default doubleword stores to zero the hash, instead of VSX instructions. Performance on POWER7/POWER8 machines does not changed.
* PowerPC: memset optimization for POWER8/PPC64Adhemerval Zanella2014-09-101-1/+1
| | | | | | | | | | | | | | | | | | | This patch adds an optimized memset implementation for POWER8. For sizes from 0 to 255 bytes, a word/doubleword algorithm similar to POWER7 optimized one is used. For size higher than 255 two strategies are used: 1. If the constant is different than 0, the memory is written with altivec vector instruction; 2. If constant is 0, dbcz instructions are used. The loop is unrolled to clear 512 byte at time. Using vector instructions increases throughput considerable, with a double performance for sizes larger than 1024. The dcbz loops unrolls also shows performance improvement, by doubling throughput for sizes larger than 8192 bytes.
* PowerPC: multiarch bzero cleanup for PPC64Adhemerval Zanella2014-09-101-1/+1
| | | | | | | | This patch cleanups the multiarch bzero for powerpc64 by remove the multiarch objects and use instead the the memset embedded implementation presented in each multiarch optimization. The code generate is essentially the same, but the TB_TOCLESS (which is not essential).
* PowerPC: optimized memmove for POWER7/PPC64Adhemerval Zanella2014-07-071-1/+2
| | | | | | | | | | | This patch adds an optimized memmove optimization for POWER7/powerpc64. Basically the idea is to use the memcpy for POWER7 on non-overlapped memory regions and a optimized backward memcpy for memory regions that overlap (similar to the idea of string/memmove.c). The backward memcpy algorithm used is similar the one use for memcpy for POWER7, with adjustments done for alignment. The difference is memory is always aligned to 16 bytes before using VSX/altivec instructions.
* PowerPC: strcat optimization for PPC64/POWER7Vidya Ranganathan2014-07-021-1/+2
| | | | | | This patch adds an ifunc power7 strcat symbol that uses the logic on sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead of default ones.
* PowerPC: Optimized strcmp for PPC64/POWER7Vidya Ranganathan2014-06-111-1/+1
| | | | | | Optimization is achieved on 8 byte aligned strings with double word comparison using cmpb instruction. On unaligned strings loop unrolling is applied for Power7 gain.
* PowerPC: strncpy/stpncpy optimization for PPC64/POWER7Vidya Ranganathan2014-05-061-1/+2
| | | | | | | | The optimization is achieved by following techniques: > data alignment [gain from aligned memory access on read/write] > POWER7 gains performance with loop unrolling/unwinding [gain by reduction of branch penalty]. > zero padding done by calling optimized memset
* PowerPC: optimized strpbrk for POWER7Adhemerval Zanella2014-03-201-1/+2
| | | | | | | | | This patch add an optimized strpbrk for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and memory clear using VSX instructions.
* PowerPC: optimized strcspn for PPC64/POWER7Adhemerval Zanella2014-03-201-1/+1
| | | | | | | | | | This patch add a optimized strcspn for POWER7 by using a different algorithm than default implementation: it constructs a table based on the 'accept' argument and use this table to check for any occurance on the input string. The idea is similar as x86_64 uses. For PowerPC some tunings were added, such as unroll loops and align stack memory to table to 16 bytes (so VSX clean can ran without alignment issues).
* PowerPC: strspn optimization for PPC64/POWER7Vidya Ranganathan2014-03-111-1/+2
| | | | | | | | The optimization is achieved by following techniques: > hashing of needle. > hashing avoids scanning of duplicate entries in needle across the string. > initializing the hash table with Vector instructions (VSX) by quadword access. > unrolling when scanning for character in string across hash table.
* PowerPC: strncat optimization for PPC64Adhemerval Zanella2014-03-101-1/+1
| | | | | | | | The optimization is achieved by following techniques: 1. Doubleword aligned memory access and compares using cmpb instruction. 2. Loop unrolling for byte load/store. 3. CPU pre-fetch to avoid cache miss.
* PowerPC: strrchr optimization for POWER7/PPC64Rajalakshmi Srinivasaraghavan2014-03-031-1/+2
| | | | | | This patch optimizes strrchr() for ppc64. It uses aligned memory access along with cmpb instruction and CPU prefetch to avoid cache misses for speed improvement.
* PowerPC: multiarch stpcpy for PowerPC64Adhemerval Zanella2013-12-131-1/+1
|
* PowerPC: multiarch strcpy for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch wordcopy for PowerPC64Adhemerval Zanella2013-12-131-1/+4
|
* PowerPC: multiarch wcscpy for PowerPC64.Adhemerval Zanella2013-12-131-1/+3
|
* PowerPC: multiarch wcsrchr for PowerPC64Adhemerval Zanella2013-12-131-1/+4
|
* PowerPC: multiarch wcschr for PowerPC64Adhemerval Zanella2013-12-131-1/+4
|
* PowerPC: multiarch strchrnul for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch strchr for PowerPC64Adhemerval Zanella2013-12-131-1/+1
|
* PowerPC: multiarch strncmp for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch strncasecmp for PowerPC64Adhemerval Zanella2013-12-131-1/+5
|
* PowerPC: multiarch strcasecmp for PowerPC64Adhemerval Zanella2013-12-131-1/+1
|
* PowerPC: multiarch strnlen for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch strlen for PowerPC64Adhemerval Zanella2013-12-131-1/+1
|
* PowerPC: multiarch rawmemchr for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch memrchr for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch memchr for PowerPC64Adhemerval Zanella2013-12-131-1/+1
|
* PowerPC: multiarch mempcpy for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multiarch memset/bzero for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|
* PowerPC: multirach memcmp for PowerPC64Adhemerval Zanella2013-12-131-1/+2
|