about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
Commit message (Collapse)AuthorAgeFilesLines
* x86: Optimize memcmp-avx2-movbe.SNoah Goldstein2021-05-181-281/+395
| | | | | | | | | | No bug. This commit optimizes memcmp-avx2.S. The optimizations include adding a new vec compare path for small sizes, reorganizing the entry control flow, and removing some unnecissary ALU instructions from the main loop. test-memcmp and test-wmemcmp are both passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86-64: Add AVX optimized string/memory functions for RTMH.J. Lu2021-03-291-15/+13
| | | | | | | | | | | | | | | | | Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-021-1/+1
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-011-1/+1
|
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ# 24097]H.J. Lu2019-01-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memcmp/wmemcmp for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise. * sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and tst-size_t-wmemcmp. * sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86-64: Optimize memcmp-avx2-movbe.S for short differenceH.J. Lu2017-06-271-56/+62
| | | | | | | | | | | | | Check the first 32 bytes before checking size when size >= 32 bytes to avoid unnecessary branch if the difference is in the first 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc. On Haswell, the new version is as fast as the previous one. On Skylake, the new version is a little bit faster. * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (MEMCMP): Check the first 32 bytes before checking size when size >= 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc.
* x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.SH.J. Lu2017-06-231-4/+2
| | | | | | | | | | | | | | | | | Turn movzbl -1(%rdi, %rdx), %edi movzbl -1(%rsi, %rdx), %esi orl %edi, %eax orl %esi, %ecx into movb -1(%rdi, %rdx), %al movb -1(%rsi, %rdx), %cl * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3): Replace movzbl and orl with movb.
* x86-64: Fix comment typo in memcmp-avx2-movbe.SFlorian Weimer2017-06-231-1/+1
|
* x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]Florian Weimer2017-06-231-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This code: L(between_2_3): /* Load as big endian with overlapping loads and bswap to avoid branches. */ movzwl -2(%rdi, %rdx), %eax movzwl -2(%rsi, %rdx), %ecx shll $16, %eax shll $16, %ecx movzwl (%rdi), %edi movzwl (%rsi), %esi orl %edi, %eax orl %esi, %ecx bswap %eax bswap %ecx subl %ecx, %eax ret needs a saturating subtract because the full register is used. With this commit, only the lower 24 bits of the register are used, so a regular subtraction suffices. The test case change adds coverage for these kinds of bugs.
* x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBEH.J. Lu2017-06-051-0/+425
Optimize x86-64 memcmp/wmemcmp with AVX2. It uses vector compare as much as possible. It is as fast as SSE4 memcmp for size <= 16 bytes and up to 2X faster for size > 16 bytes on Haswell and Skylake. Select AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. Key features: 1. For size from 2 to 7 bytes, load as big endian with movbe and bswap to avoid branches. 2. Use overlapping compare to avoid branch. 3. Use vector compare when size >= 4 bytes for memcmp or size >= 8 bytes for wmemcmp. 4. If size is 8 * VEC_SIZE or less, unroll the loop. 5. Compare 4 * VEC_SIZE at a time with the aligned first memory area. 6. Use 2 vector compares when size is 2 * VEC_SIZE or less. 7. Use 4 vector compares when size is 4 * VEC_SIZE or less. 8. Use 8 vector compares when size is 8 * VEC_SIZE or less. * sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-avx2 and wmemcmp-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2. * sysdeps/x86_64/multiarch/memcmp-avx2.S: New file. * sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred. * sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred.