tilegx: provide optimized strnlen, strstr, and strcasestr

strnlen() is based on the existing tile strlen() with length checking added. It speeds up by up to 5x, but on average across the benchtest corpus by around 35%. No regressions are seen. strstr() does 8-byte aligned loads and compares using a 2-byte filter on the first two bytes of the needle and then testing the remaining bytes in needle using memcmp(). It speeds up about 5x in the best case (for "found" needles), about 2x looking at benchtest as a whole, with some slowdowns as much as 45%. on a few cases (including the "fail" case for 128KB search). strcasestr() is based on strstr() but uses a SIMD tolower routine to convert 8-bytes to lower case in 5 instructions. It also uses a 2-byte filter and then strncasecmp() for the remaining bytes. strncasecmp() is not optimized for SIMD, so there is futher room for improvement. However, it is still up to 16x faster for "found" needles, averaging 2x faster on the whole corpus of benchtests. It does slow down by up to 35% on a few cases, similarly to strstr().
author: Chris Metcalf <cmetcalf@tilera.com> 2014-09-15 20:10:18 -0400
committer: Chris Metcalf <cmetcalf@tilera.com> 2014-10-06 11:19:18 -0400
commit: c86f7b80f43d7336eab1119dae78b0f10b7244ec (patch)
tree: 951bc7a02304a850aaed2a361df614669f5271aa /sysdeps/tile/tilegx/string-endian.h
parent: 1c4c1a6f4d0e8ffab24419d136fbfe698a201d24 (diff)
download: glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.tar.gz
glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.tar.xz
glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.zip
1 files changed, 17 insertions, 5 deletions
diff --git a/sysdeps/tile/tilegx/string-endian.h b/sysdeps/tile/tilegx/string-endian.h
index 47333891e0..2dbc1e4a4f 100644
--- a/sysdeps/tile/tilegx/string-endian.h
+++ b/sysdeps/tile/tilegx/string-endian.h
@@ -16,24 +16,36 @@
    License along with the GNU C Library.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
-/* Provide a mask based on the pointer alignment that
+#include <endian.h>
+#include <stdint.h>
+
+/* Provide a set of macros to help keep endianness #ifdefs out of
+   the string functions.
+
+   MASK: Provide a mask based on the pointer alignment that
    sets up non-zero bytes before the beginning of the string.
    The MASK expression works because shift counts are taken mod 64.
-   Also, specify how to count "first" and "last" bits
-   when the bits have been read as a word.  */
 
-#include <stdint.h>
+   NULMASK: Clear bytes beyond a given point in the string.
+
+   CFZ: Find the first zero bit in the 8 string bytes in a long.
+
+   REVCZ: Find the last zero bit in the 8 string bytes in a long.
+
+   STRSHIFT: Shift N bits towards the start of the string.  */
 
-#ifndef __BIG_ENDIAN__
+#if __BYTE_ORDER == __LITTLE_ENDIAN
 #define MASK(x) (__insn_shl(1ULL, (x << 3)) - 1)
 #define NULMASK(x) ((2ULL << x) - 1)
 #define CFZ(x) __insn_ctz(x)
 #define REVCZ(x) __insn_clz(x)
+#define STRSHIFT(x,n) ((x) >> n)
 #else
 #define MASK(x) (__insn_shl(-2LL, ((-x << 3) - 1)))
 #define NULMASK(x) (-2LL << (63 - x))
 #define CFZ(x) __insn_clz(x)
 #define REVCZ(x) __insn_ctz(x)
+#define STRSHIFT(x,n) ((x) << n)
 #endif
 
 /* Create eight copies of the byte in a uint64_t.  Byte Shuffle uses
author	Chris Metcalf <cmetcalf@tilera.com>	2014-09-15 20:10:18 -0400
committer	Chris Metcalf <cmetcalf@tilera.com>	2014-10-06 11:19:18 -0400
commit	c86f7b80f43d7336eab1119dae78b0f10b7244ec (patch)
tree	951bc7a02304a850aaed2a361df614669f5271aa /sysdeps/tile/tilegx/string-endian.h
parent	1c4c1a6f4d0e8ffab24419d136fbfe698a201d24 (diff)
download	glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.tar.gz glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.tar.xz glibc-c86f7b80f43d7336eab1119dae78b0f10b7244ec.zip