about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/ifunc-impl-list.c
diff options
context:
space:
mode:
authorSunil K Pandey <skpgkp2@gmail.com>2022-08-09 07:57:29 -0700
committerSunil K Pandey <skpgkp2@gmail.com>2022-11-03 15:51:52 -0700
commitfaaf733f49211439475e50f06716b303ee2644bf (patch)
tree3dd19a3b1d40d31f72cb063a05ad86dc273555d8 /sysdeps/x86_64/multiarch/ifunc-impl-list.c
parent1f34a2328890aa192141f96449d25b77f666bf47 (diff)
downloadglibc-faaf733f49211439475e50f06716b303ee2644bf.tar.gz
glibc-faaf733f49211439475e50f06716b303ee2644bf.tar.xz
glibc-faaf733f49211439475e50f06716b303ee2644bf.zip
x86_64: Implement evex512 version of strrchr and wcsrchr
Changes from v1:
  Use vec api for register.
  Replace VPCMP with VPCMPEQ
  Restructure and remove 1 unconditional jump.
  Change page cross logic to use sall.

This patch implements following evex512 version of string functions.
evex512 version takes up to 30% less cycle as compared to evex,
depending on length and alignment.

- strrchr function using 512 bit vectors.
- wcsrchr function using 512 bit vectors.

Code size data:

strrchr-evex.o		879 byte
strrchr-evex512.o	601 byte (-32%)

wcsrchr-evex.o		882 byte
wcsrchr-evex512.o	572 byte (-35%)

Placeholder function, not used by any processor at the moment.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Diffstat (limited to 'sysdeps/x86_64/multiarch/ifunc-impl-list.c')
-rw-r--r--sysdeps/x86_64/multiarch/ifunc-impl-list.c10
1 files changed, 10 insertions, 0 deletions
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index c3d75a09f4..7cebee7ec7 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -600,6 +600,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI1)
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __strrchr_evex)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, strrchr,
+				     (CPU_FEATURE_USABLE (AVX512VL)
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
+				     __strrchr_evex512)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, strrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
 				      && CPU_FEATURE_USABLE (BMI1)
@@ -828,6 +833,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 				      && CPU_FEATURE_USABLE (BMI1)
 				      && CPU_FEATURE_USABLE (BMI2)),
 				     __wcsrchr_evex)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, wcsrchr,
+				     (CPU_FEATURE_USABLE (AVX512VL)
+				      && CPU_FEATURE_USABLE (AVX512BW)
+				      && CPU_FEATURE_USABLE (BMI2)),
+				     __wcsrchr_evex512)
 	      X86_IFUNC_IMPL_ADD_V3 (array, i, wcsrchr,
 				     (CPU_FEATURE_USABLE (AVX2)
 				      && CPU_FEATURE_USABLE (BMI1)