diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2021-12-02 18:33:26 +0000 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2021-12-02 18:36:03 +0000 |
commit | b31bd11454fade731e5158b1aea40b133ae19926 (patch) | |
tree | d6d25ad11615c9f2c91a607f7d7a7cdc958bb5d7 /iconvdata/ibm1133.h | |
parent | b51eb35c572b015641f03e3682c303f7631279b7 (diff) | |
download | glibc-b31bd11454fade731e5158b1aea40b133ae19926.tar.gz glibc-b31bd11454fade731e5158b1aea40b133ae19926.tar.xz glibc-b31bd11454fade731e5158b1aea40b133ae19926.zip |
AArch64: Improve A64FX memcpy
v2 is a complete rewrite of the A64FX memcpy. Performance is improved by streamlining the code, aligning all large copies and using a single unrolled loop for all sizes. The code size for memcpy and memmove goes down from 1796 bytes to 868 bytes. Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33% faster for large sizes, bench-memcpy-walk is 25% faster for small sizes and 20% for the largest sizes. The geomean of all tests in bench-memcpy is 5.1% faster, and total time is reduced by 4%. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Diffstat (limited to 'iconvdata/ibm1133.h')
0 files changed, 0 insertions, 0 deletions