aarch64: optimized memcpy implementation for thunderx2 - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Anton Youdkevitch <anton.youdkevitch@bell-sw.com>	2018-10-16 11:00:27 -0700
committer	Steve Ellcey <sellcey@caviumnetworks.com>	2018-10-16 11:00:27 -0700
commit	75c1aee500ac95bde2b800b3d787c0dd805a8a82 (patch)
tree	654659bd639a9d9e6cd3cb9313f7ee8cc03672dc /libof-iterator.mk
parent	bcdb1bfa0c700db25e0f355d912ec2309f9544a2 (diff)
download	glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.tar.gz glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.tar.xz glibc-75c1aee500ac95bde2b800b3d787c0dd805a8a82.zip

aarch64: optimized memcpy implementation for thunderx2

Since aligned loads and stores are huge performance
advantage the implementation always tries to do aligned
access. Among the cases when src and dst addresses are
aligned or unaligned evenly there are cases of not evenly
unaligned src and dst. For such cases (if the length is
big enough) ext instruction is used to merge-and-shift
two memory chunks loaded from two adjacent aligned
locations and then the adjusted chunk gets stored to
aligned address.

Performance gain against the current T2 implementation:
     memcpy-large: 65K-32M: +40% - +10%
     memcpy-walk:  128-32M: +20% - +2%

Diffstat (limited to 'libof-iterator.mk')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: