about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/ifunc-avx2.h
Commit message (Collapse)AuthorAgeFilesLines
* x86-64: Require BMI2 for strchr-avx2.SH.J. Lu2021-04-191-2/+2
| | | | | | | | | | | | | | | | | Since strchr-avx2.S updated by commit 1f745ecc2109890886b161d4791e1406fdfc29b8 Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h.
* x86-64: Add AVX optimized string/memory functions for RTMH.J. Lu2021-03-291-0/+4
| | | | | | | | | | | | | | | | | Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.
* x86-64: Add ifunc-avx2.h functions with 256-bit EVEXH.J. Lu2021-03-291-3/+11
| | | | | | | | | | Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-021-1/+1
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86: Support usable check for all CPU featuresH.J. Lu2020-07-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2020-01-011-1/+1
|
* Prefer https to http for gnu.org and fsf.org URLsPaul Eggert2019-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
* Update copyright dates with scripts/update-copyrights.Joseph Myers2019-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2H.J. Lu2017-06-091-0/+36
SSE2 memchr is extended to support wmemchr. AVX2 memchr/rawmemchr/wmemchr are added to search 32 bytes with a single vector compare instruction. AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr for small sizes and up to 1.5X faster for larger sizes on Haswell and Skylake. Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/memchr.S (MEMCHR): New. Depending on if USE_AS_WMEMCHR is defined. (PCMPEQ): Likewise. (memchr): Renamed to ... (MEMCHR): This. Support wmemchr if USE_AS_WMEMCHR is defined. Replace pcmpeqb with PCMPEQ. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2, wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c. * sysdeps/x86_64/multiarch/ifunc-avx2.h: New file. * sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memchr.c: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr.c: Likewise. * sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2, __rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and __wmemchr_sse2.