about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-11-08 17:38:39 -0800
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-11-08 19:22:33 -0800
commit642933158e7cf072d873231b1a9bb03291f2b989 (patch)
tree352c3956cef706e683d0ac26ef85d165d1adcceb /sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
parentf049f52dfeed8129c11ab1641a815705d09ff7e8 (diff)
downloadglibc-642933158e7cf072d873231b1a9bb03291f2b989.tar.gz
glibc-642933158e7cf072d873231b1a9bb03291f2b989.tar.xz
glibc-642933158e7cf072d873231b1a9bb03291f2b989.zip
x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions
Optimizations are:
    1. Use more overlapping stores to avoid branches.
    2. Reduce how unrolled the aligning copies are (this is more of a
       code-size save, its a negative for some sizes in terms of
       perf).
    3. For st{r|p}n{cat|cpy} re-order the branches to minimize the
       number that are taken.

Performance Changes:

    Times are from N = 10 runs of the benchmark suite and are
    reported as geometric mean of all ratios of
    New Implementation / Old Implementation.

    strcat-avx2      -> 0.998
    strcpy-avx2      -> 0.937
    stpcpy-avx2      -> 0.971

    strncpy-avx2     -> 0.793
    stpncpy-avx2     -> 0.775

    strncat-avx2     -> 0.962

Code Size Changes:
    function         -> Bytes New / Bytes Old -> Ratio

    strcat-avx2      ->  685 / 1639 -> 0.418
    strcpy-avx2      ->  560 /  903 -> 0.620
    stpcpy-avx2      ->  592 /  939 -> 0.630

    strncpy-avx2     -> 1176 / 2390 -> 0.492
    stpncpy-avx2     -> 1268 / 2438 -> 0.520

    strncat-avx2     -> 1042 / 2563 -> 0.407

Notes:
    1. Because of the significant difference between the
       implementations they are split into three files.

           strcpy-avx2.S    -> strcpy, stpcpy, strcat
           strncpy-avx2.S   -> strncpy
           strncat-avx2.S    > strncat

       I couldn't find a way to merge them without making the
       ifdefs incredibly difficult to follow.

Full check passes on x86-64 and build succeeds for all ISA levels w/
and w/o multiarch.
Diffstat (limited to 'sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S')
-rw-r--r--sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S6
1 files changed, 3 insertions, 3 deletions
diff --git a/sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S b/sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
index 2b9c07a59f..90e532dbe8 100644
--- a/sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
+++ b/sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
@@ -1,3 +1,3 @@
-#define USE_AS_STPCPY
-#define STRCPY __stpcpy_avx2_rtm
-#include "strcpy-avx2-rtm.S"
+#define STPCPY	__stpcpy_avx2_rtm
+#include "x86-avx-rtm-vecs.h"
+#include "stpcpy-avx2.S"