about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/strrchr-evex-base.S
Commit message (Collapse)AuthorAgeFilesLines
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-011-1/+1
|
* x86: Fix unchecked AVX512-VBMI2 usage in strrchr-evex-base.SNoah Goldstein2023-11-151-24/+51
| | | | | | | | | | | | | strrchr-evex-base used `vpcompress{b|d}` in the page cross logic but was missing the CPU_FEATURE checks for VBMI2 in the ifunc/ifunc-impl-list. The fix is either to add those checks or change the logic to not use `vpcompress{b|d}`. Choosing the latter here so that the strrchr-evex implementation is usable on SKX. New implementation is a bit slower, but this is in a cold path so its probably okay.
* x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10Noah Goldstein2023-10-061-180/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors `strrchr-evex` and `strrchr-evex512` to use a common implementation: `strrchr-evex-base.S`. The motivation is `strrchr-evex` needed to be refactored to not use 64-bit masked registers in preperation for AVX10. Once vec-width masked register combining was removed, the EVEX and EVEX512 implementations can easily be implemented in the same file without any major overhead. The net result is performance improvements (measured on TGL) for both `strrchr-evex` and `strrchr-evex512`. Although, note there are some regressions in the test suite and it may be many of the cases that make the total-geomean of improvement/regression across bench-strrchr are cold. The point of the performance measurement is to show there are no major regressions, but the primary motivation is preperation for AVX10. Benchmarks where taken on TGL: https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74 EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87 Full check passes on x86.
* Fix misspellings in sysdeps/x86_64 -- BZ 25337.Paul Pluzhnikov2023-05-231-2/+2
| | | | | | | Applying this commit results in bit-identical rebuild of libc.so.6 math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1 Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-061-1/+1
|
* x86_64: Implement evex512 version of strrchr and wcsrchrSunil K Pandey2022-11-031-0/+264
Changes from v1: Use vec api for register. Replace VPCMP with VPCMPEQ Restructure and remove 1 unconditional jump. Change page cross logic to use sall. This patch implements following evex512 version of string functions. evex512 version takes up to 30% less cycle as compared to evex, depending on length and alignment. - strrchr function using 512 bit vectors. - wcsrchr function using 512 bit vectors. Code size data: strrchr-evex.o 879 byte strrchr-evex512.o 601 byte (-32%) wcsrchr-evex.o 882 byte wcsrchr-evex512.o 572 byte (-35%) Placeholder function, not used by any processor at the moment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>