about summary refs log tree commit diff
path: root/nptl_db
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-10-18 17:44:04 -0700
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-10-19 17:31:03 -0700
commit69717709ec5c2769322678e96a7672d1e270de3a (patch)
treec0c9b1aeb7dc4058a38f9b3ab4f9e419b440ea8b /nptl_db
parent330881763efff626d6b1cdf8de9ffee4ed7a1ba1 (diff)
downloadglibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.gz
glibc-69717709ec5c2769322678e96a7672d1e270de3a.tar.xz
glibc-69717709ec5c2769322678e96a7672d1e270de3a.zip
x86: Shrink / minorly optimize strchr-evex and implement with VMM headers
Size Optimizations:
1. Condence hot path for better cache-locality.
    - This is most impact for strchrnul where the logic strings with
      len <= VEC_SIZE or with a match in the first VEC no fits entirely
      in the first cache line.
2. Reuse common targets in first 4x VEC and after the loop.
3. Don't align targets so aggressively if it doesn't change the number
   of fetch blocks it will require and put more care in avoiding the
   case where targets unnecessarily split cache lines.
4. Align the loop better for DSB/LSD
5. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...
6. Align labels less aggressively, especially if it doesn't save fetch
   blocks / causes the basic-block to span extra cache-lines.

Code Size Changes:
strchr-evex.S	: -63 bytes
strchrnul-evex.S: -48 bytes

Net perf changes:
Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

strchr-evex.S (Fixed)   : 0.971
strchr-evex.S (Rand)    : 0.932
strchrnul-evex.S        : 0.965

Full results attached in email.

Full check passes on x86-64.
Diffstat (limited to 'nptl_db')
0 files changed, 0 insertions, 0 deletions