| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit refactors `strrchr-evex` and `strrchr-evex512` to use a
common implementation: `strrchr-evex-base.S`.
The motivation is `strrchr-evex` needed to be refactored to not use
64-bit masked registers in preperation for AVX10.
Once vec-width masked register combining was removed, the EVEX and
EVEX512 implementations can easily be implemented in the same file
without any major overhead.
The net result is performance improvements (measured on TGL) for both
`strrchr-evex` and `strrchr-evex512`. Although, note there are some
regressions in the test suite and it may be many of the cases that
make the total-geomean of improvement/regression across bench-strrchr
are cold. The point of the performance measurement is to show there
are no major regressions, but the primary motivation is preperation
for AVX10.
Benchmarks where taken on TGL:
https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html
EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74
EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87
Full check passes on x86.
|
|
|
|
|
|
|
| |
Applying this commit results in bit-identical rebuild of libc.so.6
math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit b412213eee0afa3b51dfe92b736dfc7c981309f5
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Tue Oct 18 17:44:07 2022 -0700
x86: Optimize strrchr-evex.S and implement with VMM headers
Added `vpcompress{b|d}` to the page-cross logic with is an
AVX512-VBMI2 instruction. This is not supported on SKX. Since the
page-cross logic is relatively cold and the benefit is minimal
revert the page-cross case back to the old logic which is supported
on SKX.
Tested on x86-64.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimization is:
1. Cache latest result in "fast path" loop with `vmovdqu` instead of
`kunpckdq`. This helps if there are more than one matches.
Code Size Changes:
strrchr-evex.S : +30 bytes (Same number of cache lines)
Net perf changes:
Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.
strrchr-evex.S : 0.932 (From cases with higher match frequency)
Full results attached in email.
Full check passes on x86-64.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Add default ISA level selection in non-multiarch/rtld
implementations.
2. Add ISA level build guards to different implementations.
- I.e strcmp-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to
include it as we will always use one of the ISA level 4
implementations (strcmp-evex.S).
3. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
|
|
|
|
|
|
|
|
|
|
|
| |
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.755
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
|
|
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.
For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
|