| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch provides optimized version of wmemset with the z13 vector
instructions.
ChangeLog:
* sysdeps/s390/multiarch/wmemset-c.c: New File.
* sysdeps/s390/multiarch/wmemset-vx.S: Likewise.
* sysdeps/s390/multiarch/wmemset.c: Likewise.
* sysdeps/s390/multiarch/Makefile
(sysdep_routines): Add wmemset functions.
* sysdeps/s390/multiarch/ifunc-impl-list-common.c
(__libc_ifunc_impl_list_common): Add ifunc test for wmemset.
* wcsmbs/wmemset.c: Use WMEMSET if defined.
* string/test-memset.c: Add wmemset support.
* wcsmbs/test-wmemset.c: New File.
* wcsmbs/Makefile (strop-tests): Add wmemset.
* benchtests/bench-memset.c: Add wmemset support.
* benchtests/bench-wmemset.c: New File.
* benchtests/Makefile (wcsmbs-bench): Add wmemset.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an optimized memset implementation for POWER8. For
sizes from 0 to 255 bytes, a word/doubleword algorithm similar to
POWER7 optimized one is used.
For size higher than 255 two strategies are used:
1. If the constant is different than 0, the memory is written with
altivec vector instruction;
2. If constant is 0, dbcz instructions are used. The loop is unrolled
to clear 512 byte at time.
Using vector instructions increases throughput considerable, with a
double performance for sizes larger than 1024. The dcbz loops unrolls
also shows performance improvement, by doubling throughput for sizes
larger than 8192 bytes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Switch the string benchmarks to using bench-timing.h instead
of hp-timing.h directly. This allows the string benchmarks to
be run usefully on architectures such as ARM that do not have
support for hp-timing.h.
In order to do this the tests have been changed from timing each
individual call and picking the lowest execution time recorded to
timing a number of calls and taking the mean execution time.
ChangeLog:
2013-09-04 Will Newton <will.newton@linaro.org>
* benchtests/bench-timing.h (TIMING_PRINT_MEAN): New macro.
* benchtests/bench-string.h: Include bench-timing.h instead
of including hp-timing.h directly. (INNER_LOOP_ITERS): New
define. (HP_TIMING_BEST): Delete macro. (test_init): Remove
call to HP_TIMING_DIFF_INIT.
* benchtests/bench-memccpy.c: Use bench-timing.h macros
instead of hp-timing.h macros.
* benchtests/bench-memchr.c: Likewise.
* benchtests/bench-memcmp.c: Likewise.
* benchtests/bench-memcpy.c: Likewise.
* benchtests/bench-memmem.c: Likewise.
* benchtests/bench-memmove.c: Likewise.
* benchtests/bench-memset.c: Likewise.
* benchtests/bench-rawmemchr.c: Likewise.
* benchtests/bench-strcasecmp.c: Likewise.
* benchtests/bench-strcasestr.c: Likewise.
* benchtests/bench-strcat.c: Likewise.
* benchtests/bench-strchr.c: Likewise.
* benchtests/bench-strcmp.c: Likewise.
* benchtests/bench-strcpy.c: Likewise.
* benchtests/bench-strcpy_chk.c: Likewise.
* benchtests/bench-strlen.c: Likewise.
* benchtests/bench-strncasecmp.c: Likewise.
* benchtests/bench-strncat.c: Likewise.
* benchtests/bench-strncmp.c: Likewise.
* benchtests/bench-strncpy.c: Likewise.
* benchtests/bench-strnlen.c: Likewise.
* benchtests/bench-strpbrk.c: Likewise.
* benchtests/bench-strrchr.c: Likewise.
* benchtests/bench-strspn.c: Likewise.
* benchtests/bench-strstr.c: Likewise.
|
|
|
|
|
|
| |
Check wheter the compiler has the option -fno-tree-loop-distribute-patterns
to inhibit loop transformation to library calls and uses it on memset
and memmove default implementation to avoid recursive calls.
|
|
Copy over already existing string performance tests into benchtests.
Bits not related to performance measurements have been omitted.
|