x86: Optimize SSE2 memchr overflow calculation

SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use "edx - (16 - ecx)", instead of satured math, to avoid possible addition overflow. This replaces add %ecx, %edx sbb %eax, %eax or %eax, %edx sub $16, %edx with neg %ecx add $16, %ecx sub %ecx, %edx It is the same for x86_64, except for rcx/rdx, instead of ecx/edx. * sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use "edx + ecx - 16" to avoid possible addition overflow. * sysdeps/x86_64/memchr.S (memchr): Likewise.
author: H.J. Lu <hjl.tools@gmail.com> 2017-05-19 10:46:29 -0700
committer: H.J. Lu <hjl.tools@gmail.com> 2017-05-19 10:48:45 -0700
commit: 402bf0695218bbe290418b9486b1dd5fe284d903 (patch)
tree: 0107d383f8a38c75076dae69996b15b46e13b04a /sysdeps/x86_64
parent: 1d71a6315396f6e1cc79a1d7ecca0a559929230a (diff)
download: glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.gz
glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.xz
glibc-402bf0695218bbe290418b9486b1dd5fe284d903.zip
1 files changed, 6 insertions, 8 deletions
diff --git a/sysdeps/x86_64/memchr.S b/sysdeps/x86_64/memchr.S
index a205a25998..f82e1c5bf7 100644
--- a/sysdeps/x86_64/memchr.S
+++ b/sysdeps/x86_64/memchr.S
@@ -76,14 +76,12 @@ L(crosscache):
 
 	.p2align 4
 L(unaligned_no_match):
-        /* Calculate the last acceptable address and check for possible
-           addition overflow by using satured math:
-           rdx = rcx + rdx
-           rdx |= -(rdx < rcx)  */
-	add	%rcx, %rdx
-	sbb	%rax, %rax
-	or	%rax, %rdx
-	sub	$16, %rdx
+        /* "rcx" is less than 16.  Calculate "rdx + rcx - 16" by using
+	   "rdx - (16 - rcx)" instead of "(rdx + rcx) - 16" to void
+	   possible addition overflow.  */
+	neg	%rcx
+	add	$16, %rcx
+	sub	%rcx, %rdx
 	jbe	L(return_null)
 	add	$16, %rdi
 	sub	$64, %rdx
author	H.J. Lu <hjl.tools@gmail.com>	2017-05-19 10:46:29 -0700
committer	H.J. Lu <hjl.tools@gmail.com>	2017-05-19 10:48:45 -0700
commit	402bf0695218bbe290418b9486b1dd5fe284d903 (patch)
tree	0107d383f8a38c75076dae69996b15b46e13b04a /sysdeps/x86_64
parent	1d71a6315396f6e1cc79a1d7ecca0a559929230a (diff)
download	glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.gz glibc-402bf0695218bbe290418b9486b1dd5fe284d903.tar.xz glibc-402bf0695218bbe290418b9486b1dd5fe284d903.zip