about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
Commit message (Collapse)AuthorAgeFilesLines
* Fix misspellings in sysdeps/x86_64 -- BZ 25337.Paul Pluzhnikov2023-05-231-4/+4
| | | | | | | Applying this commit results in bit-identical rebuild of libc.so.6 math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1 Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-061-1/+1
|
* x86: Use VMM API in memcmp-evex-movbe.S and minor changesNoah Goldstein2022-11-081-133/+175
| | | | | | | | | | The only change to the existing generated code is `tzcnt` -> `bsf` to save a byte of code size here and there. Rewriting with VMM API allows for memcmp-evex-movbe to be used with evex512 by including "x86-evex512-vecs.h" at the top. Complete check passes on x86-64.
* x86: Add support for building {w}memcmp{eq} with explicit ISA levelNoah Goldstein2022-07-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2022-01-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: *** 912-#endif remote: *** 913: remote: *** 914- remote: *** error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
* x86: Optimize L(less_vec) case in memcmp-evex-movbe.SNoah Goldstein2021-12-271-193/+56
| | | | | | | | | | | | | | | | | | No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Replace sse2 instructions with avx in memcmp-evex-movbe.SNoah Goldstein2021-10-231-2/+2
| | | | | | | | | | | | | | This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86: Optimize memcmp-evex-movbe.S for frontend behavior and sizeNoah Goldstein2021-10-121-192/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | No bug. The frontend optimizations are to: 1. Reorganize logically connected basic blocks so they are either in the same cache line or adjacent cache lines. 2. Avoid cases when basic blocks unnecissarily cross cache lines. 3. Try and 32 byte align any basic blocks possible without sacrificing code size. Smaller / Less hot basic blocks are used for this. Overall code size shrunk by 168 bytes. This should make up for any extra costs due to aligning to 64 bytes. In general performance before deviated a great deal dependending on whether entry alignment % 64 was 0, 16, 32, or 48. These changes essentially make it so that the current implementation is at least equal to the best alignment of the original for any arguments. The only additional optimization is in the page cross case. Branch on equals case was removed from the size == [4, 7] case. As well the [4, 7] and [2, 3] case where swapped as [4, 7] is likely a more hot argument size. test-memcmp and test-wmemcmp are both passing.
* x86: Optimize memcmp-evex-movbe.SNoah Goldstein2021-05-181-302/+408
| | | | | | | | | | | | No bug. This commit optimizes memcmp-evex.S. The optimizations include adding a new vec compare path for small sizes, reorganizing the entry control flow, removing some unnecissary ALU instructions from the main loop, and most importantly replacing the heavy use of vpcmp + kand logic with vpxor + vptern. test-memcmp and test-wmemcmp are both passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add memcmp family functions with 256-bit EVEXH.J. Lu2021-03-291-0/+440
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.