diff options
author | Anton Youdkevitch <anton.youdkevitch@bell-sw.com> | 2018-10-16 11:00:27 -0700 |
---|---|---|
committer | Steve Ellcey <sellcey@caviumnetworks.com> | 2018-10-16 11:00:27 -0700 |
commit | 75c1aee500ac95bde2b800b3d787c0dd805a8a82 (patch) | |
tree | 654659bd639a9d9e6cd3cb9313f7ee8cc03672dc /ChangeLog | |
parent | bcdb1bfa0c700db25e0f355d912ec2309f9544a2 (diff) | |
download | glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.tar.gz glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.tar.xz glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.zip |
aarch64: optimized memcpy implementation for thunderx2
Since aligned loads and stores are huge performance advantage the implementation always tries to do aligned access. Among the cases when src and dst addresses are aligned or unaligned evenly there are cases of not evenly unaligned src and dst. For such cases (if the length is big enough) ext instruction is used to merge-and-shift two memory chunks loaded from two adjacent aligned locations and then the adjusted chunk gets stored to aligned address. Performance gain against the current T2 implementation: memcpy-large: 65K-32M: +40% - +10% memcpy-walk: 128-32M: +20% - +2%
Diffstat (limited to 'ChangeLog')
-rw-r--r-- | ChangeLog | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/ChangeLog b/ChangeLog index 8a1ca3e1e1..f3c543aa41 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2018-10-16 Anton Youdkevitch <anton.youdkevitch@bell-sw.com> + + * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Remove thunderx2 code. + * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: New implementation + for thunderX2. + 2018-10-15 Joseph Myers <joseph@codesourcery.com> * sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Add |