about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/memmove-ssse3.S
Commit message (Collapse)AuthorAgeFilesLines
* x86: Add support for building {w}memmove{_chk} with explicit ISA levelNoah Goldstein2022-07-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memmove sse2 from memmove.S to multiarch/memmove-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memmove-evex-unaligned-erms.S). 3. Add new multiarch/rtld-memmove.S that just include the non-multiarch memmove.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch. isa raising memmove
* x86: Add missing IS_IN (libc) check to memmove-ssse3.SNoah Goldstein2022-06-291-16/+44
| | | | | | | | | | | | | | | | | | | | | | | Was missing to for the multiarch build rtld-memmove-ssse3.os was being built and exporting symbols: >$ nm string/rtld-memmove-ssse3.os U __GI___chk_fail 0000000000000020 T __memcpy_chk_ssse3 0000000000000040 T __memcpy_ssse3 0000000000000020 T __memmove_chk_ssse3 0000000000000040 T __memmove_ssse3 0000000000000000 T __mempcpy_chk_ssse3 0000000000000010 T __mempcpy_ssse3 U __x86_shared_cache_size_half Introduced after 2.35 in: commit 26b2478322db94edc9e0e8f577b2f71d291e5acb Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Thu Apr 14 11:47:40 2022 -0500 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
* x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32H.J. Lu2022-04-221-0/+4
| | | | | | | | | | | | | | | | Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by 8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S 26b2478322 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86: Reduce code size of mem{move|pcpy|cpy}-ssse3Noah Goldstein2022-04-141-4/+380
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal is to remove most SSSE3 function as SSE4, AVX2, and EVEX are generally preferable. memcpy/memmove is one exception where avoiding unaligned loads with `palignr` is important for some targets. This commit replaces memmove-ssse3 with a better optimized are lower code footprint verion. As well it aliases memcpy to memmove. Aside from this function all other SSSE3 functions should be safe to remove. The performance is not changed drastically although shows overall improvements without any major regressions or gains. bench-memcpy geometric_mean(N=50) New / Original: 0.957 bench-memcpy-random geometric_mean(N=50) New / Original: 0.912 bench-memcpy-large geometric_mean(N=50) New / Original: 0.892 Benchmarks where run on Zhaoxin KX-6840@2000MHz See attached numbers for all results. More important this saves 7246 bytes of code size in memmove an additional 10741 bytes by reusing memmove code for memcpy (total 17987 bytes saves). As well an additional 896 bytes of rodata for the jump table entries.
* Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7H.J. Lu2010-06-301-0/+4
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to 4X on Core 2 and up to 2X on Core i7.