From 922369032c604b4dcfd535e1bcddd4687e7126a5 Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Thu, 10 Aug 2017 17:00:38 +0100
Subject: [AArch64] Optimized memcmp.

This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

	* sysdeps/aarch64/memcmp.S (memcmp):
	Rewrite of optimized memcmp.
---
 ChangeLog | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'ChangeLog')

diff --git a/ChangeLog b/ChangeLog
index dcf86261ca..e5e36a40e6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2017-08-10  Wilco Dijkstra  <wdijkstr@arm.com>
+
+	* sysdeps/aarch64/memcmp.S (memcmp):
+	Rewrite of optimized memcmp.
+
 2017-08-10  Florian Weimer  <fweimer@redhat.com>
 
 	Introduce ld.so exceptions.
-- 
cgit 1.4.1