| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force
32-bit displacement to avoid long nop between instructions.
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add
a comment on VMOVU and VMOVA.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX
and AVX512 memmove and memset in ld.so.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip
if not in libc.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__mempcpy_erms and __memmove_erms can't be placed between __memmove_chk
and __memmove it breaks __memmove_chk.
Don't check source == destination first since it is less common.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
(__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk
with unaligned_erms.
(__memmove_erms): Skip if source == destination.
(__memmove_unaligned_erms): Don't check source == destination
first.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 memset with unaligned store and rep movsb. Support
16-byte, 32-byte and 64-byte vector register sizes. A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb. They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.
Key features:
1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.
[BZ #19881]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
memset-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
__memset_sse2_unaligned_erms, __memset_erms,
__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes. When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination. Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.
A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb. They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.
Key features:
1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy. Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.
[BZ #19776]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
memmove-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test
__memmove_chk_avx512_unaligned_2,
__memmove_chk_avx512_unaligned_erms,
__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
__memmove_chk_sse2_unaligned_2,
__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
__memmove_avx512_unaligned_erms, __memmove_erms,
__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
__memcpy_chk_avx512_unaligned_2,
__memcpy_chk_avx512_unaligned_erms,
__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
__mempcpy_chk_avx512_unaligned_erms,
__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
__mempcpy_erms.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
Likwise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make
__memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper
to reduce code size of libc.so.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memcpy-avx512-no-vzeroupper.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed
to ...
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This.
(MEMCPY): Don't define.
(MEMCPY_CHK): Likewise.
(MEMPCPY): Likewise.
(MEMPCPY_CHK): Likewise.
(MEMPCPY_CHK): Renamed to ...
(__mempcpy_chk_avx512_no_vzeroupper): This.
(MEMPCPY_CHK): Renamed to ...
(__mempcpy_chk_avx512_no_vzeroupper): This.
(MEMCPY_CHK): Renamed to ...
(__memmove_chk_avx512_no_vzeroupper): This.
(MEMCPY): Renamed to ...
(__memmove_avx512_no_vzeroupper): This.
(__memcpy_avx512_no_vzeroupper): New alias.
(__memcpy_chk_avx512_no_vzeroupper): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement x86-64 multiarch mempcpy in memcpy to share most of code. It
reduces code size of libc.so.
[BZ #18858]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned
and mempcpy-avx512-no-vzeroupper.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK):
New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
(MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New.
(MEMPCPY): Likewise.
* sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed.
* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load. A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.
[BZ #19583]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
processors. Set Fast_Copy_Backward for AMD Excavator
processors.
* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
New.
(index_arch_Fast_Unaligned_Copy): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY):
Don't set %rcx twice before "rep movsb".
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
Replace .text with .text.avx512.
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for
Fast_Copy_Backward to enable __memcpy_ssse3_back. Existing selection
order is updated with following selection order:
1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
3. __memcpy_sse2 if SSSE3 isn't available.
4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
5. __memcpy_ssse3
[BZ #18880]
* sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load,
instead of Slow_BSF, and also check for Fast_Copy_Backward to
enable __memcpy_ssse3_back.
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Fixed build with
assembler not supporting AVX-512.
|
|
|
|
| |
* sysdeps/x86_64/multiarch/memcpy_chk.S: Fixed typos.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added AVX512 implementations of memcpy, mempcpy, memmove, memcpy_chk,
mempcpy_chk, memmove_chk.
It shows average improvement more than 30% over AVX versions on KNL
hardware (performance results in the thread
<https://sourceware.org/ml/libc-alpha/2016-01/msg00258.html>).
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new files.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It shows improvement up to 28% over AVX2 memset (performance results
attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>).
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER,
index_Prefer_No_VZEROUPPER): New.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set the
Prefer_No_VZEROUPPER for Knights Landing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are configure tests for the -mavx2 compiler option. AVX2
support was added in GCC 4.7, so these tests are now obsolete; this
patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* sysdeps/i386/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* config.h.in (HAVE_AVX2_SUPPORT): Remove #undef.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 unconditionally instead of conditionally on
[$(config-cflags-avx2) = yes].
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list) [HAVE_AVX2_SUPPORT]: Make code
unconditional.
* sysdeps/x86_64/multiarch/memset.S [HAVE_AVX2_SUPPORT]: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S
[IS_IN (libc) && SHARED && HAVE_AVX2_SUPPORT]: Change conditional
to [IS_IN (libc) && SHARED].
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC added support for -mavx and -msse2avx in version 4.4. Thus the
configure tests for this support are obsolete, and this patch removes
them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_avx): Remove configure
test.
(libc_cv_cc_sse2avx): Likewise.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/i686/multiarch/Makefile
[$(subdir)$(config-cflags-avx) = mathyes]: Change conditional to
[$(subdir) = math].
* sysdeps/i386/i686/multiarch/s_fma-fma.c [HAVE_AVX_SUPPORT]: Make
code unconditional.
* sysdeps/i386/i686/multiarch/s_fma.c [HAVE_AVX_SUPPORT]:
Likewise.
* sysdeps/i386/i686/multiarch/s_fmaf-fma.c [HAVE_AVX_SUPPORT]:
Likewise.
* sysdeps/i386/i686/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]:
Likewise.
* sysdeps/x86_64/configure.ac (libc_cv_cc_avx): Remove configure
test.
(libc_cv_cc_sse2avx): Likewise.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/Makefile [$(config-cflags-avx) = yes]: Make code
unconditional.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile)
[HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT]: Make code
unconditional.
(_dl_runtime_profile)
[!(HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT)]: Remove
conditional code.
* sysdeps/x86_64/fpu/multiarch/Makefile
[$(config-cflags-sse2avx) = yes]: Make code unconditional.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_atan.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_AVX_SUPPORT]:
Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]:
Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sin.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_tan.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S [HAVE_AVX_SUPPORT]: Likewise.
* config.h.in (HAVE_AVX_SUPPORT): Remove #undef.
(HAVE_SSE2AVX_SUPPORT): Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC added support for -msse4 in version 4.3. Thus the configure tests
for it are obsolete, and this patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/i686/multiarch/Makefile
[$(config-cflags-sse4) = yes]: Make code unconditional.
* sysdeps/i386/i686/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/i386/i686/multiarch/strspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/multiarch/Makefile [$(config-cflags-sse4) = yes]:
Make code unconditional.
* sysdeps/x86_64/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise.
* config.h.in (HAVE_SSE4_SUPPORT): Remove #undef.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since ld.so preserves vector registers now, we can use the regular,
non-ifunc string and memory functions in ld.so.
* sysdeps/x86_64/rtld-memcmp.c: Removed.
* sysdeps/x86_64/rtld-memset.S: Likewise.
* sysdeps/x86_64/rtld-strchr.S: Likewise.
* sysdeps/x86_64/rtld-strlen.S: Likewise.
* sysdeps/x86_64/multiarch/rtld-memcmp.c: Likewise.
* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysdeps/i386/i686/multiarch/strcasestr-c.c became unused after
commit 1818483b15d22016b0eae41d37ee91cc87b37510
Author: Andreas Schwab <schwab@suse.de>
Date: Wed Dec 18 11:53:27 2013 +1000
Remove use of SSE4.2 functions for strstr on i686
which contains
-sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c
+sysdep_routines += strcspn-c strpbrk-c strspn-c
sysdeps/x86_64/multiarch/strcasestr.c became useless after
t 584b18eb4df61ccd447db2dfe8c8a7901f8c8598
Author: Ondřej Bílka <neleai@seznam.cz>
Date: Sat Dec 14 19:33:56 2013 +0100
Add strstr with unaligned loads. Fixes bug 12100.
which changes sysdeps/x86_64/multiarch/strcasestr.c to
libc_ifunc (__strcasestr, __strcasestr_sse2);
This patch removes these file.
* i386/i686/multiarch/strcasestr-c.c: Removed.
* x86_64/multiarch/strcasestr.c: Likewise.
* x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Remove strcasestr.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h
which can be used for both i386 and x86_64.
* sysdeps/i386/i686/multiarch/init-arch.h: Removed.
* sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise.
* sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead
of "multiarch/init-arch.h".
* sysdeps/x86_64/multiarch/init-arch.h: Renamed to ...
* sysdeps/x86/init-arch.h: This.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates x86_64 multiarch functions to use the newly defined
HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from
<cpu-features.h>.
* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use
LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1).
* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise.
* sysdeps/x86_64/multiarch/strstr.c: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/test-multiarch.c: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features
call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
and initializes it early before __libc_start_main is called so that
cpu_features is always available when it is used and we can avoid
calling __init_cpu_features in IFUNC selectors.
* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
(dl_platform_init): Call init_cpu_features.
* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
* sysdeps/i386/i686/cacheinfo.c
(DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed.
* sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch.
* sysdeps/i386/i686/multiarch/Versions: Removed.
* sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET):
Removed.
* sysdeps/i386/ldsodefs.h: Include <cpu-features.h>.
* sysdeps/unix/sysv/linux/x86/Makefile
(libpthread-sysdep_routines): Remove init-arch.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include
<sysdeps/x86_64/dl-procinfo.c> instead of
sysdeps/generic/dl-procinfo.c>.
* sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers):
Add cpu-features-offsets.sym and rtld-global-offsets.sym.
[$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features.
[$(subdir) == elf] (tests): Add tst-get-cpu-features.
[$(subdir) == elf] (tests-static): Add
tst-get-cpu-features-static.
* sysdeps/x86/Versions: New file.
* sysdeps/x86/cpu-features-offsets.sym: Likewise.
* sysdeps/x86/cpu-features.c: Likewise.
* sysdeps/x86/cpu-features.h: Likewise.
* sysdeps/x86/dl-get-cpu-features.c: Likewise.
* sysdeps/x86/libc-start.c: Likewise.
* sysdeps/x86/rtld-global-offsets.sym: Likewise.
* sysdeps/x86/tst-get-cpu-features-static.c: Likewise.
* sysdeps/x86/tst-get-cpu-features.c: Likewise.
* sysdeps/x86_64/dl-procinfo.c: Likewise.
* sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed.
Assume USE_MULTIARCH is defined and don't check it.
(is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features).
(is_amd): Likewise.
(max_cpuid): Likewise.
(intel_check_word): Likewise.
(__cache_sysconf): Don't call __init_cpu_features.
(__x86_preferred_memory_instruction): Removed.
(init_cacheinfo): Don't call __init_cpu_features. Replace
__cpu_features with GLRO(dl_x86_cpu_features).
* sysdeps/x86_64/dl-machine.h: <cpu-features.c>.
(dl_platform_init): Call init_cpu_features.
* sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>.
* sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch.
* sysdeps/x86_64/multiarch/Versions: Removed.
* sysdeps/x86_64/multiarch/cacheinfo.c: Likewise.
* sysdeps/x86_64/multiarch/init-arch.c: Likewise.
* sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET):
Removed.
* sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
|
|
|
|
|
|
|
|
| |
{memcpy,strcmp}-sse2-unaligned.S aren't needed in ld.so.
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Compile
only for libc.
* sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Likewise.
|
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX512F_Usable,
bit_AVX512DQ_Usable, bit_Opmask_state, bit_ZMM0_15_state,
bit_ZMM16_31_state): New macro.
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Check and set bit_AVX512F_Usable, bit_AVX512DQ_Usable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To make a strtok faster and improve performance in general we need to do one
additional change.
A comment:
/* It doesn't make sense to send libc-internal strcspn calls through a PLT.
The speedup we get from using SSE4.2 instruction is likely eaten away
by the indirect call in the PLT. */
Does not make sense at all because nobody bothered to check it. Gap
between these implementations is quite big, when haystack is empty a
sse2 is around 40 cycles slower because it needs to populate a lookup
table and difference only increases with size. That is much bigger than
plt slowdown which is few cycles.
Even benchtest show a gap which also may be reverse by branch
misprediction but my internal benchmark shown.
simple_strspn stupid_strspn __strspn_sse42 __strspn_sse2
Length 0, alignment 0, acc len 6: 18.6562 35.2344 17.0469 61.6719
Length 6, alignment 0, acc len 6: 59.5469 72.5781 16.4219 73.625
This patch also handles strpbrk which is implemented by including a
x86_64/multiarch/strcspn.S file.
* sysdeps/x86_64/multiarch/strspn.S: Remove plt indirection.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.
[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
|
| |
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Treat model numbers 0x4a/0x4d as Intel Silvermont architecture.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace with !IS_IN (libc). This completes the transition from
the IS_IN/NOT_IN macros to the IN_MODULE macro set.
The generated code is unchanged on x86_64.
* stdlib/isomac.c (fmt): Replace NOT_IN_libc with IN_MODULE.
(get_null_defines): Adjust.
* sunrpc/Makefile: Adjust comment.
* Makerules (CPPFLAGS-nonlib): Remove NOT_IN_libc.
* elf/Makefile (CPPFLAGS-sotruss-lib): Likewise.
(CFLAGS-interp.c): Likewise.
(CFLAGS-ldconfig.c): Likewise.
(CPPFLAGS-.os): Likewise.
* elf/rtld-Rules (rtld-CPPFLAGS): Likewise.
* extra-lib.mk (CPPFLAGS-$(lib)): Likewise.
* extra-modules.mk (extra-modules.mk): Likewise.
* iconv/Makefile (CPPFLAGS-iconvprogs): Likewise.
* locale/Makefile (CPPFLAGS-locale_programs): Likewise.
* malloc/Makefile (CPPFLAGS-memusagestat): Likewise.
* nscd/Makefile (CPPFLAGS-nscd): Likewise.
* nss/Makefile (CPPFLAGS-nss_test1): Likewise.
* stdlib/Makefile (CFLAGS-tst-putenvmod.c): Likewise.
* sysdeps/gnu/Makefile ($(objpfx)errlist-compat.c): Likewise.
* sysdeps/unix/sysv/linux/Makefile (CPPFLAGS-lddlibc4): Likewise.
* iconvdata/Makefile (CPPFLAGS): Likewise.
(cpp-srcs-left): Add libof for all iconvdata routines.
* bits/stdio-lock.h: Replace NOT_IN_libc with IS_IN.
* include/assert.h: Likewise.
* include/ctype.h: Likewise.
* include/errno.h: Likewise.
* include/libc-symbols.h: Likewise.
* include/math.h: Likewise.
* include/netdb.h: Likewise.
* include/resolv.h: Likewise.
* include/stdio.h: Likewise.
* include/stdlib.h: Likewise.
* include/string.h: Likewise.
* include/sys/stat.h: Likewise.
* include/wctype.h: Likewise.
* intl/l10nflist.c: Likewise.
* libidn/idn-stub.c: Likewise.
* libio/libioP.h: Likewise.
* nptl/libc_multiple_threads.c: Likewise.
* nptl/pthreadP.h: Likewise.
* posix/regex_internal.h: Likewise.
* resolv/res_hconf.c: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy.S: Likewise.
* sysdeps/arm/memmove.S: Likewise.
* sysdeps/arm/sysdep.h: Likewise.
* sysdeps/generic/_itoa.h: Likewise.
* sysdeps/generic/symbol-hacks.h: Likewise.
* sysdeps/gnu/errlist.awk: Likewise.
* sysdeps/gnu/errlist.c: Likewise.
* sysdeps/i386/i586/memcpy.S: Likewise.
* sysdeps/i386/i586/memset.S: Likewise.
* sysdeps/i386/i686/memcpy.S: Likewise.
* sysdeps/i386/i686/memmove.S: Likewise.
* sysdeps/i386/i686/mempcpy.S: Likewise.
* sysdeps/i386/i686/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/bcopy.S: Likewise.
* sysdeps/i386/i686/multiarch/bzero.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/memset_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/strcspn.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strnlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strspn.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise.
* sysdeps/ia64/fpu/libm-symbols.h: Likewise.
* sysdeps/nptl/bits/libc-lock.h: Likewise.
* sysdeps/nptl/bits/libc-lockP.h: Likewise.
* sysdeps/nptl/bits/stdio-lock.h: Likewise.
* sysdeps/posix/closedir.c: Likewise.
* sysdeps/posix/opendir.c: Likewise.
* sysdeps/posix/readdir.c: Likewise.
* sysdeps/posix/rewinddir.c: Likewise.
* sysdeps/powerpc/novmx-sigjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strpbrk.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-32/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memset.S: Likewise.
* sysdeps/s390/s390-64/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-64/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memset.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset.S: Likewise.
* sysdeps/unix/alpha/sysdep.S: Likewise.
* sysdeps/unix/alpha/sysdep.h: Likewise.
* sysdeps/unix/make-syscalls.sh: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/bits/m68k-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/not-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/longjmp_chk.c: Likewise.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/brk.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/waitpid.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h: Likewise.
* sysdeps/wordsize-32/symbol-hacks.h: Likewise.
* sysdeps/x86_64/memcpy.S: Likewise.
* sysdeps/x86_64/memmove.c: Likewise.
* sysdeps/x86_64/memset.S: Likewise.
* sysdeps/x86_64/multiarch/init-arch.h: Likewise.
* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr-sse2-no-bsf.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-c.c: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* sysdeps/x86_64/strcmp.S: Likewise.
|
| |
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 2% to 10%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* config.h.in (HAVE_AVX2_SUPPORT): New #undef.
* sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and
config-cflags-avx2.
* sysdeps/x86_64/configure.ac: Likewise.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure: Likewise.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 only if config-cflags-avx2 is yes.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is
defined.
* sysdeps/x86_64/multiarch/memset.S: Define multiple versions
only if HAVE_AVX2_SUPPORT is defined.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds ifunc tests for x86_64 memset_chk and memset. It also
defines HAS_AVX2 with AVX2_Usable since AVX2 may not be usable even if
processor has AVX2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Add tests for memset_chk and memset.
* sysdeps/x86_64/multiarch/init-arch.h (HAS_AVX2): Defined
with AVX2_Usable.
|
|
|
|
|
|
|
| |
Since there is no sysdeps/x86_64/multiarch/strlen.S,
sysdeps/x86_64/rtld-strlen.S will be used.
* sysdeps/x86_64/multiarch/rtld-strlen.S: Removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx & avx2 instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 26% to 59%.
* sysdeps/x86_64/multiarch/Makefile: Add memset-avx2.
* sysdeps/x86_64/multiarch/memset-avx2.S: New file.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
|
|
|
|
|
|
|
|
| |
Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros
for use by both assembly and C code. This fixes the
-Wundef error for cases where FEATURE_INDEX_1 was not
defined but used the correct value of 0 for an undefined
macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch checks and sets bit_AVX2_Usable in __cpu_features.feature.
* sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7):
New.
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Check and set bit_AVX2_Usable.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New
macro.
(bit_AVX2): Likewise.
(index_AVX2_Usable): Likewise.
(CPUID_AVX2): Likewise.
(HAS_AVX2): Likewise.
|
| |
|
|
|
|
| |
File name update missed in commit 584b18eb.
|
|
|
|
|
|
|
|
|
|
| |
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|