PowerPC: memset optimization for POWER8/PPC64

This patch adds an optimized memset implementation for POWER8. For sizes from 0 to 255 bytes, a word/doubleword algorithm similar to POWER7 optimized one is used. For size higher than 255 two strategies are used: 1. If the constant is different than 0, the memory is written with altivec vector instruction; 2. If constant is 0, dbcz instructions are used. The loop is unrolled to clear 512 byte at time. Using vector instructions increases throughput considerable, with a double performance for sizes larger than 1024. The dcbz loops unrolls also shows performance improvement, by doubling throughput for sizes larger than 8192 bytes.
author: Adhemerval Zanella <azanella@linux.vnet.ibm.com> 2014-07-15 12:19:09 -0400
committer: Adhemerval Zanella <azanella@linux.vnet.ibm.com> 2014-09-10 07:39:46 -0400
commit: 71ae86478edc7b21872464f43fb29ff650c1681a (patch)
tree: a75679fa464a1d19543020ef0c4f4f982d099d99 /benchtests
parent: 3b473fecdf4c52989cd915b649bb6d26c042d048 (diff)
download: glibc-71ae86478edc7b21872464f43fb29ff650c1681a.tar.gz
glibc-71ae86478edc7b21872464f43fb29ff650c1681a.tar.xz
glibc-71ae86478edc7b21872464f43fb29ff650c1681a.zip
1 files changed, 5 insertions, 0 deletions
diff --git a/benchtests/bench-memset.c b/benchtests/bench-memset.c
index 5304113e3d..20265936b9 100644
--- a/benchtests/bench-memset.c
+++ b/benchtests/bench-memset.c
@@ -150,6 +150,11 @@ test_main (void)
 	  if (i & (i - 1))
 	    do_test (0, c, i);
 	}
+      for (i = 32; i < 512; i+=32)
+	{
+	  do_test (0, c, i);
+	  do_test (i, c, i);
+	}
       do_test (1, c, 14);
       do_test (3, c, 1024);
       do_test (4, c, 64);
author	Adhemerval Zanella <azanella@linux.vnet.ibm.com>	2014-07-15 12:19:09 -0400
committer	Adhemerval Zanella <azanella@linux.vnet.ibm.com>	2014-09-10 07:39:46 -0400
commit	71ae86478edc7b21872464f43fb29ff650c1681a (patch)
tree	a75679fa464a1d19543020ef0c4f4f982d099d99 /benchtests
parent	3b473fecdf4c52989cd915b649bb6d26c042d048 (diff)
download	glibc-71ae86478edc7b21872464f43fb29ff650c1681a.tar.gz glibc-71ae86478edc7b21872464f43fb29ff650c1681a.tar.xz glibc-71ae86478edc7b21872464f43fb29ff650c1681a.zip