diff options
author | MayShao-oc <MayShao-oc@zhaoxin.com> | 2024-06-29 11:58:27 +0800 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2024-06-30 06:26:43 -0700 |
commit | c19457aec67da28a3f78badef53556cd55640a6e (patch) | |
tree | 41d5357bb9426f975e15ad88e30196c35e0cfc49 /iconvdata | |
parent | 44d757eb9f4484dbc3aa32042ab64cdf9374e093 (diff) | |
download | glibc-c19457aec67da28a3f78badef53556cd55640a6e.tar.gz glibc-c19457aec67da28a3f78badef53556cd55640a6e.tar.xz glibc-c19457aec67da28a3f78badef53556cd55640a6e.zip |
x86_64: Optimize large size copy in memmove-ssse3
This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Diffstat (limited to 'iconvdata')
0 files changed, 0 insertions, 0 deletions