diff options
author | Pedro Franco de Carvalho <pedromfc@linux.ibm.com> | 2021-06-30 12:36:07 -0300 |
---|---|---|
committer | Matheus Castanho <msc@linux.ibm.com> | 2021-07-01 17:58:53 -0300 |
commit | 813c6ec808556553be9d39e900a3fc97ceb32330 (patch) | |
tree | d7eebeb92d99e632a82cf41f7e1f8f7c18cc3772 /README | |
parent | 8241409e29a347ff6613d28d13cb1c7cdf1ec888 (diff) | |
download | glibc-813c6ec808556553be9d39e900a3fc97ceb32330.tar.gz glibc-813c6ec808556553be9d39e900a3fc97ceb32330.tar.xz glibc-813c6ec808556553be9d39e900a3fc97ceb32330.zip |
powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and stpcpy to optimize it for POWER9/10. Since no new POWER10 instructions are used, the original POWER9 strcpy is modified instead of creating a new implementation for POWER10. This implementation is based on both the original POWER9 implementation of strcpy and the preamble of the new POWER10 implementation of strlen. The changes also affect stpcpy, which uses the same implementation with some additional code before returning. On POWER9, averaging improvements across the benchmark inputs (length/source alignment/destination alignment), for an experiment that ran the benchmark five times, bench-strcpy showed an improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%. On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%. The changes are: 1. Removed the null string optimization. Although this results in a few extra cycles for the null string, in combination with the second change, this resulted in improvements for for other cases. 2. Adapted the preamble from strlen for POWER10. This is the part of the function that handles up to the first 16 bytes of the string. 3. Increased number of unrolled iterations in the main loop to 6. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Tested-by: Matheus Castanho <msc@linux.ibm.com>
Diffstat (limited to 'README')
0 files changed, 0 insertions, 0 deletions