about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-06-06 21:11:32 -0700
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-06-07 13:10:27 -0700
commitaf5306a735eb0966fdc2f8ccdafa8888e2df0c87 (patch)
treeb1ea652e9609e495c9a702ff8ea07f86e4956cff /sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
parentb4209615a06b01c974f47b4998b00e4c7b1aa5d9 (diff)
downloadglibc-af5306a735eb0966fdc2f8ccdafa8888e2df0c87.tar.gz
glibc-af5306a735eb0966fdc2f8ccdafa8888e2df0c87.tar.xz
glibc-af5306a735eb0966fdc2f8ccdafa8888e2df0c87.zip
x86: Optimize memrchr-avx2.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 306 bytes
Geometric Mean of all benchmarks New / Old: 0.760

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
10-20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 15-45% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Diffstat (limited to 'sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S')
-rw-r--r--sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S1
1 files changed, 1 insertions, 0 deletions
diff --git a/sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S b/sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
index cea2d2a72d..5e9beeeef2 100644
--- a/sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
+++ b/sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
@@ -2,6 +2,7 @@
 # define MEMRCHR __memrchr_avx2_rtm
 #endif
 
+#define COND_VZEROUPPER	COND_VZEROUPPER_XTEST
 #define ZERO_UPPER_VEC_REGISTERS_RETURN \
   ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST