about summary refs log tree commit diff
path: root/sysdeps/aarch64/multiarch/memcpy_advsimd.S
Commit message (Collapse)AuthorAgeFilesLines
* aarch64: Use memcpy_simd as the default memcpyWilco Dijkstra2024-04-111-248/+0
| | | | | | | Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. (cherry picked from commit 531717afbc49c5cf1d994ddf827891abc906056a)
* AArch64: Improve backwards memmove performanceWilco Dijkstra2020-10-141-3/+4
| | | | | | | | | On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)
* AArch64: Add optimized Q-register memcpyWilco Dijkstra2020-10-141-0/+247
Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)