diff options
author | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-10-18 17:44:04 -0700 |
---|---|---|
committer | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-10-19 17:31:03 -0700 |
commit | 69717709ec5c2769322678e96a7672d1e270de3a (patch) | |
tree | c0c9b1aeb7dc4058a38f9b3ab4f9e419b440ea8b /locale/broken_cur_max.c | |
parent | 330881763efff626d6b1cdf8de9ffee4ed7a1ba1 (diff) | |
download | glibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.gz glibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.xz glibc-69717709ec5c2769322678e96a7672d1e270de3a.zip |
x86: Shrink / minorly optimize strchr-evex and implement with VMM headers
Size Optimizations: 1. Condence hot path for better cache-locality. - This is most impact for strchrnul where the logic strings with len <= VEC_SIZE or with a match in the first VEC no fits entirely in the first cache line. 2. Reuse common targets in first 4x VEC and after the loop. 3. Don't align targets so aggressively if it doesn't change the number of fetch blocks it will require and put more care in avoiding the case where targets unnecessarily split cache lines. 4. Align the loop better for DSB/LSD 5. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 6. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. Code Size Changes: strchr-evex.S : -63 bytes strchrnul-evex.S: -48 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. strchr-evex.S (Fixed) : 0.971 strchr-evex.S (Rand) : 0.932 strchrnul-evex.S : 0.965 Full results attached in email. Full check passes on x86-64.
Diffstat (limited to 'locale/broken_cur_max.c')
0 files changed, 0 insertions, 0 deletions