about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-10-18 17:44:03 -0700
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-10-19 17:31:03 -0700
commit330881763efff626d6b1cdf8de9ffee4ed7a1ba1 (patch)
treebe5ed6967393bbb1b87d8ac6c1ed3cc1bdef25cd /sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S
parent451c6e58540e8571e31581c04c4829e5d2cfe8ac (diff)
downloadglibc-330881763efff626d6b1cdf8de9ffee4ed7a1ba1.tar.gz
glibc-330881763efff626d6b1cdf8de9ffee4ed7a1ba1.tar.xz
glibc-330881763efff626d6b1cdf8de9ffee4ed7a1ba1.zip
x86: Optimize memchr-evex.S and implement with VMM headers
Optimizations are:

1. Use the fact that tzcnt(0) -> VEC_SIZE for memchr to save a branch
   in short string case.
2. Restructure code so that small strings are given the hot path.
	- This is a net-zero on the benchmark suite but in general makes
      sense as smaller sizes are far more common.
3. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...
4. Align labels less aggressively, especially if it doesn't save fetch
   blocks / causes the basic-block to span extra cache-lines.

The optimizations (especially for point 2) make the memchr and
rawmemchr code essentially incompatible so split rawmemchr-evex
to a new file.

Code Size Changes:
memchr-evex.S       : -107 bytes
rawmemchr-evex.S    :  -53 bytes

Net perf changes:

Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

memchr-evex.S       : 0.928
rawmemchr-evex.S    : 0.986 (Less targets cross cache lines)

Full results attached in email.

Full check passes on x86-64.
Diffstat (limited to 'sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S')
-rw-r--r--sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S9
1 files changed, 6 insertions, 3 deletions
diff --git a/sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S b/sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S
index deda1ca395..2073eaa620 100644
--- a/sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S
+++ b/sysdeps/x86_64/multiarch/rawmemchr-evex-rtm.S
@@ -1,3 +1,6 @@
-#define MEMCHR __rawmemchr_evex_rtm
-#define USE_AS_RAWMEMCHR 1
-#include "memchr-evex-rtm.S"
+#define RAWMEMCHR	__rawmemchr_evex_rtm
+
+#define USE_IN_RTM	1
+#define SECTION(p)	p##.evex.rtm
+
+#include "rawmemchr-evex.S"