[AArch64] Optimized memcmp. - mirror/glibc - mirror of git://sourceware.org/git/glibc.git

diff options

author	Wilco Dijkstra <wdijkstr@arm.com>	2017-08-10 17:00:38 +0100
committer	Wilco Dijkstra <wdijkstr@arm.com>	2017-08-10 17:00:38 +0100
commit	922369032c604b4dcfd535e1bcddd4687e7126a5 (patch)
tree	82779a2afc66f4ef2f2c9006f90a412bffaad23e /localedata/tests-mbwc/tst_funcs.h
parent	2449ae7b2da24c9940962304a3e44bc80e389265 (diff)
download	glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.gz glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.xz glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.zip

[AArch64] Optimized memcmp.

This is an optimized memcmp for AArch64.  This is a complete rewrite
using a different algorithm.  The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop.  The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.

This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared.  After the first 8 bytes,
align the first input.  This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.

This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.

	* sysdeps/aarch64/memcmp.S (memcmp):
	Rewrite of optimized memcmp.

Diffstat (limited to 'localedata/tests-mbwc/tst_funcs.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: