about summary refs log tree commit diff
path: root/Makeconfig
diff options
context:
space:
mode:
authorPedro Franco de Carvalho <pedromfc@linux.ibm.com>2021-06-30 12:36:07 -0300
committerMatheus Castanho <msc@linux.ibm.com>2021-07-01 17:58:53 -0300
commit813c6ec808556553be9d39e900a3fc97ceb32330 (patch)
treed7eebeb92d99e632a82cf41f7e1f8f7c18cc3772 /Makeconfig
parent8241409e29a347ff6613d28d13cb1c7cdf1ec888 (diff)
downloadglibc-813c6ec808556553be9d39e900a3fc97ceb32330.tar.gz
glibc-813c6ec808556553be9d39e900a3fc97ceb32330.tar.xz
glibc-813c6ec808556553be9d39e900a3fc97ceb32330.zip
powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
Diffstat (limited to 'Makeconfig')
0 files changed, 0 insertions, 0 deletions