about summary refs log tree commit diff
path: root/sysdeps/aarch64/multiarch/memcpy_advsimd.S
Commit message (Collapse)AuthorAgeFilesLines
* AArch64: Improve backwards memmove performanceWilco Dijkstra2020-08-281-3/+4
| | | | | | | | On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* AArch64: Add optimized Q-register memcpyWilco Dijkstra2020-07-151-0/+247
Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>