diff options
author | Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> | 2021-04-30 18:12:08 -0300 |
---|---|---|
committer | Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> | 2021-04-30 18:12:08 -0300 |
commit | e941e0ae80626b7661c1db8953a673cafd3b8b19 (patch) | |
tree | 42b3dcccfce69af0f7ffb0fa4ed2ed75734b82a2 /mathvec | |
parent | dd59655e9371af86043b97e38953f43bd9496699 (diff) | |
download | glibc-e941e0ae80626b7661c1db8953a673cafd3b8b19.tar.gz glibc-e941e0ae80626b7661c1db8953a673cafd3b8b19.tar.xz glibc-e941e0ae80626b7661c1db8953a673cafd3b8b19.zip |
powerpc64le: Optimize memcpy for POWER10
This implementation is based on __memcpy_power8_cached and integrates suggestions from Anton Blanchard. It benefits from loads and stores with length for short lengths and for tail code, simplifying the code. All unaligned memory accesses use instructions that do not generate alignment interrupts on POWER10, making it safe to use on caching-inhibited memory. The main loop has also been modified in order to increase instruction throughput by reducing the dependency on updates from previous iterations. On average, this implementation provides around 30% improvement when compared to __memcpy_power7 and 10% improvement in comparison to __memcpy_power8_cached.
Diffstat (limited to 'mathvec')
0 files changed, 0 insertions, 0 deletions