about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/strstr-avx512.c
Commit message (Collapse)AuthorAgeFilesLines
* x86: Remove __mmask intrinsics in strstr-avx512.cNoah Goldstein2022-07-121-6/+10
| | | | | | | | | | | | The intrinsics are not available before GCC7 and using standard operators generates code of equivalent or better quality. Removed: _cvtmask64_u64 _kshiftri_mask64 _kand_mask64 Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958
* x86_64: Add strstr function with 512-bit EVEXRaghuveer Devulapalli2022-06-061-0/+214
Adding a 512-bit EVEX version of strstr. The algorithm works as follows: (1) We spend a few cycles at the begining to peek into the needle. We locate an edge in the needle (first occurance of 2 consequent distinct characters) and also store the first 64-bytes into a zmm register. (2) We search for the edge in the haystack by looking into one cache line of the haystack at a time. This avoids having to read past a page boundary which can cause a seg fault. (3) If an edge is found in the haystack we first compare the first 64-bytes of the needle (already stored in a zmm register) before we proceed with a full string compare performed byte by byte. Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512) Geometric mean of all benchmarks: new / old = 0.66 Difficult skiptable(0) : new / old = 0.02 Difficult skiptable(1) : new / old = 0.01 Difficult 2-way : new / old = 0.25 Difficult testing first 2 : new / old = 1.26 Difficult skiptable(0) : new / old = 0.05 Difficult skiptable(1) : new / old = 0.06 Difficult 2-way : new / old = 0.26 Difficult testing first 2 : new / old = 1.05 Difficult skiptable(0) : new / old = 0.42 Difficult skiptable(1) : new / old = 0.24 Difficult 2-way : new / old = 0.21 Difficult testing first 2 : new / old = 1.04 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>