about summary refs log tree commit diff
path: root/sysdeps/x86_64/fpu
diff options
context:
space:
mode:
authorMahesh Bodapati <bmahi496@linux.ibm.com>2024-08-23 16:48:32 -0500
committerPeter Bergner <bergner@linux.ibm.com>2024-08-23 16:48:32 -0500
commit82b5340ebdb8f00589d548e6e2dc8c998f07d0c5 (patch)
tree53b8aab414b136e0cc158be905fd8c9f29924d32 /sysdeps/x86_64/fpu
parent89b53077d2a58f00e7debdfe58afabe953dac60d (diff)
downloadglibc-82b5340ebdb8f00589d548e6e2dc8c998f07d0c5.tar.gz
glibc-82b5340ebdb8f00589d548e6e2dc8c998f07d0c5.tar.xz
glibc-82b5340ebdb8f00589d548e6e2dc8c998f07d0c5.zip
powerpc64: Optimize strcpy and stpcpy for Power9/10
This patch modifies the current Power9 implementation of strcpy and
stpcpy to optimize it for Power9 and Power10.

No new Power10 instructions are used, so the original Power9 strcpy
is modified instead of creating a new implementation for Power10.

The changes also affect stpcpy, which uses the same implementation
with some additional code before returning.

Improvements compared to the old Power9 version:

Use simple comparisons for the first ~512 bytes:
  The main loop is good for long strings, but comparing 16B each time is
  better for shorter strings. After aligning the address to 16 bytes, we
  unroll the loop four times, checking 128 bytes each time. There may be
  some overlap with the main loop for unaligned strings, but it is better
  for shorter strings.

Loop with 64 bytes for longer bytes:
  Use 4 consecutive lxv/stxv instructions.

Showed an average improvement of 13%.

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Diffstat (limited to 'sysdeps/x86_64/fpu')
0 files changed, 0 insertions, 0 deletions