| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since strcmp_sse2_unaligned performs better on current Intel and AMD
processors, this patch makes it the default.
* sysdeps/x86_64/strcmp.S: Moved to ...
* sysdeps/x86_64/multiarch/strcmp-sse2.S: Here. Remove
"#if !IS_IN (libc)". Remove libc_hidden_builtin_def (STRCMP).
(STRCMP): Defined to __strcmp_sse2 if not defined.
* sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Moved to ...
* sysdeps/x86_64/strcmp.S: Here. Remove "#if IS_IN (libc)".
Add .text. Add libc_hidden_builtin_def (strcmp).
(__strcmp_sse2_unaligned): Renamed to ...
(strcmp): This.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strcmp-sse2.
* sysdeps/x86_64/multiarch/strcasecmp_l-ssse3.S: Include
strcmp-sse2.S instead of ../strcmp.S.
* sysdeps/x86_64/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strncase_l-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strncmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S
[USE_AS_STRCMP] (STRCMP_SSE2): Set to __strcmp_sse2_unaligned.
[USE_AS_STRCMP] (STRCMP): Load __strcmp_sse2 instead of
STRCMP_SSE2.
[USE_AS_STRCMP] (strcmp): Defined __strcmp_sse2_unaligned if
in libc.
[!USE_AS_STRCMP]: Include strcmp-sse2S instead of ../strcmp.S.
* sysdeps/x86_64/strcasecmp_l.S: Include multiarch/strcmp-sse2.S
instead of strcmp.S. Add libc_hidden_builtin_def (STRCMP).
* sysdeps/x86_64/strncase_l.S: Likewise.
* sysdeps/x86_64/strncmp.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For x86-64 memcpy/mempcpy, we choose the best implementation by the
order:
1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
3. __memcpy_sse2 if SSSE3 isn't available.
4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
5. __memcpy_ssse3
In libc.a and ld.so, we choose __memcpy_sse2_unaligned which is optimized
for current Intel and AMD x86-64 processors.
[BZ #18880]
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Moved to ...
* sysdeps/x86_64/memcpy.S: Here. Remove "#if !IS_IN (libc)".
Add libc_hidden_builtin_def and versioned_symbol.
(__memcpy_chk): New.
(__memcpy_sse2_unaligned): Renamed to ...
(memcpy): This. Support USE_AS_MEMPCPY.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
mempcpy-sse2.
* sysdeps/x86_64/memcpy.S: Moved to ...
sysdeps/x86_64/multiarch/memcpy-sse2.S: Here.
(__memcpy_chk): Renamed to ...
(__memcpy_chk_sse2): This.
(memcpy): Renamed to ...
(__memcpy_sse2): This.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Properly
select the best implementation.
(ENTRY): Replace __memcpy_sse2 with __memcpy_sse2_unaligned.
(END): Likewise.
(libc_hidden_builtin_def): Likewise.
(ENTRY_CHK): Replace __memcpy_chk_sse2 with
__memcpy_chk_sse2_unaligned.
(END_CHK): Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Properly
select the best implementation.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Properly
select the best implementation.
(ENTRY): Replace __mempcpy_sse2 with __mempcpy_sse2_unaligned.
(END): Likewise.
(libc_hidden_def): Likewise.
(libc_hidden_builtin_def): Likewise.
(ENTRY_CHK): Replace __mempcpy_chk_sse2 with
__mempcpy_chk_sse2_unaligned.
(END_CHK): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Properly
select the best implementation.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since ld.so preserves vector registers now, we can use the regular,
non-ifunc string and memory functions in ld.so.
* sysdeps/x86_64/rtld-memcmp.c: Removed.
* sysdeps/x86_64/rtld-memset.S: Likewise.
* sysdeps/x86_64/rtld-strchr.S: Likewise.
* sysdeps/x86_64/rtld-strlen.S: Likewise.
* sysdeps/x86_64/multiarch/rtld-memcmp.c: Likewise.
* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysdeps/i386/i686/multiarch/strcasestr-c.c became unused after
commit 1818483b15d22016b0eae41d37ee91cc87b37510
Author: Andreas Schwab <schwab@suse.de>
Date: Wed Dec 18 11:53:27 2013 +1000
Remove use of SSE4.2 functions for strstr on i686
which contains
-sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c
+sysdep_routines += strcspn-c strpbrk-c strspn-c
sysdeps/x86_64/multiarch/strcasestr.c became useless after
t 584b18eb4df61ccd447db2dfe8c8a7901f8c8598
Author: Ondřej Bílka <neleai@seznam.cz>
Date: Sat Dec 14 19:33:56 2013 +0100
Add strstr with unaligned loads. Fixes bug 12100.
which changes sysdeps/x86_64/multiarch/strcasestr.c to
libc_ifunc (__strcasestr, __strcasestr_sse2);
This patch removes these file.
* i386/i686/multiarch/strcasestr-c.c: Removed.
* x86_64/multiarch/strcasestr.c: Likewise.
* x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Remove strcasestr.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h
which can be used for both i386 and x86_64.
* sysdeps/i386/i686/multiarch/init-arch.h: Removed.
* sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise.
* sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead
of "multiarch/init-arch.h".
* sysdeps/x86_64/multiarch/init-arch.h: Renamed to ...
* sysdeps/x86/init-arch.h: This.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates x86_64 multiarch functions to use the newly defined
HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from
<cpu-features.h>.
* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use
LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1).
* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise.
* sysdeps/x86_64/multiarch/strstr.c: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/test-multiarch.c: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features
call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
and initializes it early before __libc_start_main is called so that
cpu_features is always available when it is used and we can avoid
calling __init_cpu_features in IFUNC selectors.
* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
(dl_platform_init): Call init_cpu_features.
* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
* sysdeps/i386/i686/cacheinfo.c
(DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed.
* sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch.
* sysdeps/i386/i686/multiarch/Versions: Removed.
* sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET):
Removed.
* sysdeps/i386/ldsodefs.h: Include <cpu-features.h>.
* sysdeps/unix/sysv/linux/x86/Makefile
(libpthread-sysdep_routines): Remove init-arch.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include
<sysdeps/x86_64/dl-procinfo.c> instead of
sysdeps/generic/dl-procinfo.c>.
* sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers):
Add cpu-features-offsets.sym and rtld-global-offsets.sym.
[$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features.
[$(subdir) == elf] (tests): Add tst-get-cpu-features.
[$(subdir) == elf] (tests-static): Add
tst-get-cpu-features-static.
* sysdeps/x86/Versions: New file.
* sysdeps/x86/cpu-features-offsets.sym: Likewise.
* sysdeps/x86/cpu-features.c: Likewise.
* sysdeps/x86/cpu-features.h: Likewise.
* sysdeps/x86/dl-get-cpu-features.c: Likewise.
* sysdeps/x86/libc-start.c: Likewise.
* sysdeps/x86/rtld-global-offsets.sym: Likewise.
* sysdeps/x86/tst-get-cpu-features-static.c: Likewise.
* sysdeps/x86/tst-get-cpu-features.c: Likewise.
* sysdeps/x86_64/dl-procinfo.c: Likewise.
* sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed.
Assume USE_MULTIARCH is defined and don't check it.
(is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features).
(is_amd): Likewise.
(max_cpuid): Likewise.
(intel_check_word): Likewise.
(__cache_sysconf): Don't call __init_cpu_features.
(__x86_preferred_memory_instruction): Removed.
(init_cacheinfo): Don't call __init_cpu_features. Replace
__cpu_features with GLRO(dl_x86_cpu_features).
* sysdeps/x86_64/dl-machine.h: <cpu-features.c>.
(dl_platform_init): Call init_cpu_features.
* sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>.
* sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch.
* sysdeps/x86_64/multiarch/Versions: Removed.
* sysdeps/x86_64/multiarch/cacheinfo.c: Likewise.
* sysdeps/x86_64/multiarch/init-arch.c: Likewise.
* sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET):
Removed.
* sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
|
|
|
|
|
|
|
|
| |
{memcpy,strcmp}-sse2-unaligned.S aren't needed in ld.so.
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Compile
only for libc.
* sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: Likewise.
|
|
|
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX512F_Usable,
bit_AVX512DQ_Usable, bit_Opmask_state, bit_ZMM0_15_state,
bit_ZMM16_31_state): New macro.
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Check and set bit_AVX512F_Usable, bit_AVX512DQ_Usable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To make a strtok faster and improve performance in general we need to do one
additional change.
A comment:
/* It doesn't make sense to send libc-internal strcspn calls through a PLT.
The speedup we get from using SSE4.2 instruction is likely eaten away
by the indirect call in the PLT. */
Does not make sense at all because nobody bothered to check it. Gap
between these implementations is quite big, when haystack is empty a
sse2 is around 40 cycles slower because it needs to populate a lookup
table and difference only increases with size. That is much bigger than
plt slowdown which is few cycles.
Even benchtest show a gap which also may be reverse by branch
misprediction but my internal benchmark shown.
simple_strspn stupid_strspn __strspn_sse42 __strspn_sse2
Length 0, alignment 0, acc len 6: 18.6562 35.2344 17.0469 61.6719
Length 6, alignment 0, acc len 6: 59.5469 72.5781 16.4219 73.625
This patch also handles strpbrk which is implemented by including a
x86_64/multiarch/strcspn.S file.
* sysdeps/x86_64/multiarch/strspn.S: Remove plt indirection.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.
[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
|
| |
|
|
|
|
|
| |
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Treat model numbers 0x4a/0x4d as Intel Silvermont architecture.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace with !IS_IN (libc). This completes the transition from
the IS_IN/NOT_IN macros to the IN_MODULE macro set.
The generated code is unchanged on x86_64.
* stdlib/isomac.c (fmt): Replace NOT_IN_libc with IN_MODULE.
(get_null_defines): Adjust.
* sunrpc/Makefile: Adjust comment.
* Makerules (CPPFLAGS-nonlib): Remove NOT_IN_libc.
* elf/Makefile (CPPFLAGS-sotruss-lib): Likewise.
(CFLAGS-interp.c): Likewise.
(CFLAGS-ldconfig.c): Likewise.
(CPPFLAGS-.os): Likewise.
* elf/rtld-Rules (rtld-CPPFLAGS): Likewise.
* extra-lib.mk (CPPFLAGS-$(lib)): Likewise.
* extra-modules.mk (extra-modules.mk): Likewise.
* iconv/Makefile (CPPFLAGS-iconvprogs): Likewise.
* locale/Makefile (CPPFLAGS-locale_programs): Likewise.
* malloc/Makefile (CPPFLAGS-memusagestat): Likewise.
* nscd/Makefile (CPPFLAGS-nscd): Likewise.
* nss/Makefile (CPPFLAGS-nss_test1): Likewise.
* stdlib/Makefile (CFLAGS-tst-putenvmod.c): Likewise.
* sysdeps/gnu/Makefile ($(objpfx)errlist-compat.c): Likewise.
* sysdeps/unix/sysv/linux/Makefile (CPPFLAGS-lddlibc4): Likewise.
* iconvdata/Makefile (CPPFLAGS): Likewise.
(cpp-srcs-left): Add libof for all iconvdata routines.
* bits/stdio-lock.h: Replace NOT_IN_libc with IS_IN.
* include/assert.h: Likewise.
* include/ctype.h: Likewise.
* include/errno.h: Likewise.
* include/libc-symbols.h: Likewise.
* include/math.h: Likewise.
* include/netdb.h: Likewise.
* include/resolv.h: Likewise.
* include/stdio.h: Likewise.
* include/stdlib.h: Likewise.
* include/string.h: Likewise.
* include/sys/stat.h: Likewise.
* include/wctype.h: Likewise.
* intl/l10nflist.c: Likewise.
* libidn/idn-stub.c: Likewise.
* libio/libioP.h: Likewise.
* nptl/libc_multiple_threads.c: Likewise.
* nptl/pthreadP.h: Likewise.
* posix/regex_internal.h: Likewise.
* resolv/res_hconf.c: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy.S: Likewise.
* sysdeps/arm/memmove.S: Likewise.
* sysdeps/arm/sysdep.h: Likewise.
* sysdeps/generic/_itoa.h: Likewise.
* sysdeps/generic/symbol-hacks.h: Likewise.
* sysdeps/gnu/errlist.awk: Likewise.
* sysdeps/gnu/errlist.c: Likewise.
* sysdeps/i386/i586/memcpy.S: Likewise.
* sysdeps/i386/i586/memset.S: Likewise.
* sysdeps/i386/i686/memcpy.S: Likewise.
* sysdeps/i386/i686/memmove.S: Likewise.
* sysdeps/i386/i686/mempcpy.S: Likewise.
* sysdeps/i386/i686/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/bcopy.S: Likewise.
* sysdeps/i386/i686/multiarch/bzero.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/memset_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/strcspn.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strnlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strspn.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise.
* sysdeps/ia64/fpu/libm-symbols.h: Likewise.
* sysdeps/nptl/bits/libc-lock.h: Likewise.
* sysdeps/nptl/bits/libc-lockP.h: Likewise.
* sysdeps/nptl/bits/stdio-lock.h: Likewise.
* sysdeps/posix/closedir.c: Likewise.
* sysdeps/posix/opendir.c: Likewise.
* sysdeps/posix/readdir.c: Likewise.
* sysdeps/posix/rewinddir.c: Likewise.
* sysdeps/powerpc/novmx-sigjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strpbrk.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-32/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memset.S: Likewise.
* sysdeps/s390/s390-64/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-64/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memset.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset.S: Likewise.
* sysdeps/unix/alpha/sysdep.S: Likewise.
* sysdeps/unix/alpha/sysdep.h: Likewise.
* sysdeps/unix/make-syscalls.sh: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/bits/m68k-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/not-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/longjmp_chk.c: Likewise.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/brk.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/waitpid.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h: Likewise.
* sysdeps/wordsize-32/symbol-hacks.h: Likewise.
* sysdeps/x86_64/memcpy.S: Likewise.
* sysdeps/x86_64/memmove.c: Likewise.
* sysdeps/x86_64/memset.S: Likewise.
* sysdeps/x86_64/multiarch/init-arch.h: Likewise.
* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr-sse2-no-bsf.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-c.c: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* sysdeps/x86_64/strcmp.S: Likewise.
|
| |
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 2% to 10%.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* config.h.in (HAVE_AVX2_SUPPORT): New #undef.
* sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and
config-cflags-avx2.
* sysdeps/x86_64/configure.ac: Likewise.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure: Likewise.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 only if config-cflags-avx2 is yes.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is
defined.
* sysdeps/x86_64/multiarch/memset.S: Define multiple versions
only if HAVE_AVX2_SUPPORT is defined.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds ifunc tests for x86_64 memset_chk and memset. It also
defines HAS_AVX2 with AVX2_Usable since AVX2 may not be usable even if
processor has AVX2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Add tests for memset_chk and memset.
* sysdeps/x86_64/multiarch/init-arch.h (HAS_AVX2): Defined
with AVX2_Usable.
|
|
|
|
|
|
|
| |
Since there is no sysdeps/x86_64/multiarch/strlen.S,
sysdeps/x86_64/rtld-strlen.S will be used.
* sysdeps/x86_64/multiarch/rtld-strlen.S: Removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx & avx2 instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 26% to 59%.
* sysdeps/x86_64/multiarch/Makefile: Add memset-avx2.
* sysdeps/x86_64/multiarch/memset-avx2.S: New file.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
|
|
|
|
|
|
|
|
| |
Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros
for use by both assembly and C code. This fixes the
-Wundef error for cases where FEATURE_INDEX_1 was not
defined but used the correct value of 0 for an undefined
macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch checks and sets bit_AVX2_Usable in __cpu_features.feature.
* sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7):
New.
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Check and set bit_AVX2_Usable.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New
macro.
(bit_AVX2): Likewise.
(index_AVX2_Usable): Likewise.
(CPUID_AVX2): Likewise.
(HAS_AVX2): Likewise.
|
| |
|
|
|
|
| |
File name update missed in commit 584b18eb.
|
|
|
|
|
|
|
|
|
|
| |
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
|
| |
|
|
|
|
|
|
|
|
|
| |
I have small patch for new Intel Silvermont machines.
http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture
I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
|
|
|
|
|
|
|
|
|
| |
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
|
| |
|
| |
|
|
|
|
| |
This reverts commit b79188d71716b6286866e06add976fe84100595e.
|
|
|
|
|
| |
which is faster on all x86_64 architectures.
Tested on AMD, Intel Nehalem, SNB, IVB.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also add separate tests for bcopy and bzero.
|
| |
|