From 104c7b1967c3e78435c6f7eab5e225a7eddf9c6e Mon Sep 17 00:00:00 2001 From: Noah Goldstein Date: Tue, 4 May 2021 19:02:40 -0400 Subject: x86: Add EVEX optimized memchr family not safe for RTM No bug. This commit adds a new implementation for EVEX memchr that is not safe for RTM because it uses vzeroupper. The benefit is that by using ymm0-ymm15 it can use vpcmpeq and vpternlogd in the 4x loop which is faster than the RTM safe version which cannot use vpcmpeq because there is no EVEX encoding for the instruction. All parts of the implementation aside from the 4x loop are the same for the two versions and the optimization is only relevant for large sizes. Tigerlake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 9.2 , 9.04 , no-RTM , 0.16 512 , 7 , 224 , 9.19 , 8.98 , no-RTM , 0.21 2048 , 0 , 256 , 10.74 , 10.54 , no-RTM , 0.2 2048 , 0 , 512 , 14.81 , 14.87 , RTM , 0.06 2048 , 0 , 1024 , 22.97 , 22.57 , no-RTM , 0.4 2048 , 0 , 2048 , 37.49 , 34.51 , no-RTM , 2.98 <-- Icelake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 7.6 , 7.3 , no-RTM , 0.3 512 , 7 , 224 , 7.63 , 7.27 , no-RTM , 0.36 2048 , 0 , 256 , 8.48 , 8.38 , no-RTM , 0.1 2048 , 0 , 512 , 11.57 , 11.42 , no-RTM , 0.15 2048 , 0 , 1024 , 17.92 , 17.38 , no-RTM , 0.54 2048 , 0 , 2048 , 30.37 , 27.34 , no-RTM , 3.03 <-- test-memchr, test-wmemchr, and test-rawmemchr are all passing. Signed-off-by: Noah Goldstein Reviewed-by: H.J. Lu --- sysdeps/x86_64/multiarch/Makefile | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'sysdeps/x86_64/multiarch/Makefile') diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 491c7698dc..2c2aad3a7e 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -77,7 +77,9 @@ sysdep_routines += strncat-c stpncpy-c strncpy-c \ strncmp-evex \ strncpy-evex \ strnlen-evex \ - strrchr-evex + strrchr-evex \ + memchr-evex-rtm \ + rawmemchr-evex-rtm CFLAGS-varshift.c += -msse4 CFLAGS-strcspn-c.c += -msse4 CFLAGS-strpbrk-c.c += -msse4 @@ -110,7 +112,8 @@ sysdep_routines += wmemcmp-sse4 wmemcmp-ssse3 wmemcmp-c \ wcsnlen-evex \ wcsrchr-evex \ wmemchr-evex \ - wmemcmp-evex-movbe + wmemcmp-evex-movbe \ + wmemchr-evex-rtm endif ifeq ($(subdir),debug) -- cgit 1.4.1