From 1efad39b227c0d3cd0641cae70c7e95c8ca290a6 Mon Sep 17 00:00:00 2001 From: Stefan Liebler Date: Wed, 26 Aug 2015 10:26:26 +0200 Subject: S390: Optimize string, wcsmbs and memory functions. This patch set introduces optimized string, wcsmbs and memory functions for S390/S390x. The functions are accelerated by the usage of the new z13 vector instructions. The Principles of Operations manual for IBM z13 is publically available: http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr010.pdf The support for these instructions in assembler was introduced by commits: -"[Committed] S/390: Add support for IBM z13." (https://sourceware.org/ml/binutils/2015-01/msg00197.html) -"[Committed] S/390: Add more IBM z13 instructions" (https://sourceware.org/ml/binutils/2015-03/msg00088.html) The first patches do preparation for the latter optimization patches. The floating point exception handling - fetestexcept(), ... - is fixed and the platform and hwcap strings are extended. The current ifunc routines memset, memcpy and memcmp are refactored and the ifunc test-framework is now enabled. A S390 specific configure-check tests if the used binutils supports the new vector instructions. The optimized functions are provided via ifunc if the binutils supports the vector instructions. Otherwise a message is dumped to configure output and only the currently used common code functions are available. The optimized functions are implemented in common for s390-32 and s390-64 and the few differences are handled via #ifdef. The ifunc-resolvers are defined in files sysdeps/s390/multiarch/.c, which choose either the current implementation ___c() or the vector implementation ___vx() depending on the HWCAP_S390_VX flag bit in AT_HWCAP field. If the bit is set, the hardware and the kernel are supporting vector registers and instructions. If the used binutils lacks vector-support, then the default implementation in string or wcsmbs directory is included here instead. The file sysdeps/s390/multiarch/-c.c includes the current implementation and defines the function name ___c. The assembler files sysdeps/s390/multiarch/-vx.S with the vector instructions are using the directive '.machine "z13"' to allow building glibc without option '-march=z13'. Additionally the directive '.machinemode "zarch_nohighgprs"' is needed for the 31bit glibc. This mode does not set the highgprs flag in ELF header, which would lead to an unloadable libc on a 31bit kernel. The most optimized string functions are structured in the same way: The first 16 bytes of the string is loaded unaligned via vlbb - vector load to block boundary (e.g. 4k). This instruction loads 16 bytes if possible. In case of a page cross, it only loads the last bytes of the current page without a segmentation fault. Afterwards these first part of string is processed. If e.g. for strlen the end of string is reached within this first part, the function returns. Otherwise the pointer is aligned to 16 byte, so i can load a full vector register with vl without checking for a page cross. Afterwards the first part of string is processed. If e.g. for strlen the end of string is reached within this first part, the function returns. Otherwise the pointer is aligned to 16 byte, so a full vector register can be loaded with vl - vector load - without checking for a page cross. The remaining string is processed in a four times unrolled loop, because benchmark results measured improvements compared to a non unrolled loop. The optimized wide string functions can only handle 4byte aligned string pointers. Although a wchar_t pointer should always be 4byte aligned, the most current common code wide string functions can handle non aligned strings. Thus the optimized functions will fall back to the common code functions in case of a non aligned wide string to behave the same as before this patch. Some string tests can test the string and the wide string version of a function. The remaining ones are extended and new wide string tests are added. This is the same in case of the benchtests. ChangeLog: * NEWS: New item for IBM z13 string optimizations. --- ChangeLog | 4 ++++ NEWS | 9 ++++++--- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/ChangeLog b/ChangeLog index 30809696ae..873851670c 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2015-08-26 Stefan Liebler + + * NEWS: New item for IBM z13 string optimizations. + 2015-08-26 Stefan Liebler * sysdeps/s390/multiarch/memrchr-c.c: New File. diff --git a/NEWS b/NEWS index ee0a9be47a..80bddc8a02 100644 --- a/NEWS +++ b/NEWS @@ -10,12 +10,15 @@ Version 2.23 * The following bugs are resolved with this release: 14341, 16517, 16519, 16520, 16734, 16973, 17787, 17905, 18084, 18086, - 18240, 18265, 18370, 18421, 18480, 18525, 18618, 18647, 18661, 18674, - 18681, 18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, 18823, - 18824. + 18240, 18265, 18370, 18421, 18480, 18525, 18610, 18618, 18647, 18661, + 18674,18681, 18778, 18781, 18787, 18789, 18790, 18795, 18796, 18820, + 18823, 18824. * The obsolete header has been removed. Programs that require this header must be updated to use instead. + +* Optimized string, wcsmbs and memory functions for IBM z13. + Implemented by Stefan Liebler. Version 2.22 -- cgit 1.4.1