| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations. Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.
Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.
[BZ #21396]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
Prefer_No_AVX512 if AVX512ER isn't available.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New.
(index_arch_Prefer_No_AVX512): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use
AVX512 version if Prefer_No_AVX512 is set.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk):
Likewise.
* sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk):
Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk):
Likewise.
* sysdeps/x86_64/multiarch/memset.S (memset): Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk):
Likewise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used
to save the first 8 vector registers, which only saves the lower 256
bits of vector register, for lazy binding. When it is called on AVX512
platform, the upper 256 bits of ZMM registers are clobbered. Parameters
passed in ZMM registers will be wrong when the function is called the
first time. This patch requires binutils 2.24, whose assembler can store
and load ZMM registers, to build x86-64 glibc. Since mathvec library
needs assembler support for AVX512DQ, we disable mathvec if assembler
doesn't support AVX512DQ.
[BZ #20139]
* config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ...
(HAVE_AVX512DQ_ASM_SUPPORT): This.
* sysdeps/x86_64/configure.ac: Require assembler from binutils
2.24 or above.
(HAVE_AVX512_ASM_SUPPORT): Removed.
(HAVE_AVX512DQ_ASM_SUPPORT): New.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT
check unconditional.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove.S: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check
HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT.
* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S:
Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove,
memcpy, mempcpy and memset aren't used by the current processors, this
patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so
that they can be used in the future.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New.
(index_arch_Prefer_ERMS): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return
__memcpy_erms for Prefer_ERMS.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
(__memmove_erms): Enabled for libc.a.
* ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return
__memmove_erms or Prefer_ERMS.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return
__mempcpy_erms for Prefer_ERMS.
* sysdeps/x86_64/multiarch/memset.S (memset): Return
__memset_erms for Prefer_ERMS.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
the new ones.
No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
before. If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
memcpy/memmove optimized with Enhanced REP MOVSB will be used for
processors with ERMS. The new AVX512 memcpy/memmove will be used for
processors with AVX512 which prefer vzeroupper.
Since the new SSE2 memcpy/memmove are faster than the previous default
memcpy/memmove used in libc.a and ld.so, we also remove the previous
default memcpy/memmove and make them the default memcpy/memmove, except
that non-temporal store isn't used in ld.so.
Together, it reduces the size of libc.so by about 6 KB and the size of
ld.so by about 2 KB.
[BZ #19776]
* sysdeps/x86_64/memcpy.S: Make it dummy.
* sysdeps/x86_64/mempcpy.S: Likewise.
* sysdeps/x86_64/memmove.S: New file.
* sysdeps/x86_64/memmove_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.S: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
* sysdeps/x86_64/memmove.c: Removed.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memcpy-sse2-unaligned, memmove-avx-unaligned,
memcpy-avx-unaligned and memmove-sse2-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Replace
__memmove_chk_avx512_unaligned_2 with
__memmove_chk_avx512_unaligned. Remove
__memmove_chk_avx_unaligned_2. Replace
__memmove_chk_sse2_unaligned_2 with
__memmove_chk_sse2_unaligned. Remove __memmove_chk_sse2 and
__memmove_avx_unaligned_2. Replace __memmove_avx512_unaligned_2
with __memmove_avx512_unaligned. Replace
__memmove_sse2_unaligned_2 with __memmove_sse2_unaligned.
Remove __memmove_sse2. Replace __memcpy_chk_avx512_unaligned_2
with __memcpy_chk_avx512_unaligned. Remove
__memcpy_chk_avx_unaligned_2. Replace
__memcpy_chk_sse2_unaligned_2 with __memcpy_chk_sse2_unaligned.
Remove __memcpy_chk_sse2. Remove __memcpy_avx_unaligned_2.
Replace __memcpy_avx512_unaligned_2 with
__memcpy_avx512_unaligned. Remove __memcpy_sse2_unaligned_2
and __memcpy_sse2. Replace __mempcpy_chk_avx512_unaligned_2
with __mempcpy_chk_avx512_unaligned. Remove
__mempcpy_chk_avx_unaligned_2. Replace
__mempcpy_chk_sse2_unaligned_2 with
__mempcpy_chk_sse2_unaligned. Remove __mempcpy_chk_sse2.
Replace __mempcpy_avx512_unaligned_2 with
__mempcpy_avx512_unaligned. Remove __mempcpy_avx_unaligned_2.
Replace __mempcpy_sse2_unaligned_2 with
__mempcpy_sse2_unaligned. Remove __mempcpy_sse2.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Support
__memcpy_avx512_unaligned_erms and __memcpy_avx512_unaligned.
Use __memcpy_avx_unaligned_erms and __memcpy_sse2_unaligned_erms
if processor has ERMS. Default to __memcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_builtin_def): Likewise.
Don't include ../memcpy.S.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Support
__memcpy_chk_avx512_unaligned_erms and
__memcpy_chk_avx512_unaligned. Use
__memcpy_chk_avx_unaligned_erms and
__memcpy_chk_sse2_unaligned_erms if if processor has ERMS.
Default to __memcpy_chk_sse2_unaligned.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
Change function suffix from unaligned_2 to unaligned.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Support
__mempcpy_avx512_unaligned_erms and __mempcpy_avx512_unaligned.
Use __mempcpy_avx_unaligned_erms and __mempcpy_sse2_unaligned_erms
if processor has ERMS. Default to __mempcpy_sse2_unaligned.
(ENTRY): Removed.
(END): Likewise.
(ENTRY_CHK): Likewise.
(libc_hidden_builtin_def): Likewise.
Don't include ../mempcpy.S.
(mempcpy): New. Add a weak alias.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Support
__mempcpy_chk_avx512_unaligned_erms and
__mempcpy_chk_avx512_unaligned. Use
__mempcpy_chk_avx_unaligned_erms and
__mempcpy_chk_sse2_unaligned_erms if if processor has ERMS.
Default to __mempcpy_chk_sse2_unaligned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added AVX512 implementations of memcpy, mempcpy, memmove, memcpy_chk,
mempcpy_chk, memmove_chk.
It shows average improvement more than 30% over AVX versions on KNL
hardware (performance results in the thread
<https://sourceware.org/ml/libc-alpha/2016-01/msg00258.html>).
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new files.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch updates x86_64 multiarch functions to use the newly defined
HAS_CPU_FEATURE, HAS_ARCH_FEATURE and LOAD_RTLD_GLOBAL_RO_RDX from
<cpu-features.h>.
* sysdeps/x86_64/fpu/multiarch/e_asin.c: Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Use
LOAD_RTLD_GLOBAL_RO_RDX and HAS_CPU_FEATURE (SSE4_1).
* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S : Likewise.
* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.S : Likewise.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/x86_64/multiarch/sched_cpucount.c: Likewise.
* sysdeps/x86_64/multiarch/strstr.c: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
* sysdeps/x86_64/multiarch/test-multiarch.c: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Remove __init_cpu_features
call. Add LOAD_RTLD_GLOBAL_RO_RDX. Replace HAS_XXX with
HAS_CPU_FEATURE/HAS_ARCH_FEATURE (XXX).
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.
[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace with !IS_IN (libc). This completes the transition from
the IS_IN/NOT_IN macros to the IN_MODULE macro set.
The generated code is unchanged on x86_64.
* stdlib/isomac.c (fmt): Replace NOT_IN_libc with IN_MODULE.
(get_null_defines): Adjust.
* sunrpc/Makefile: Adjust comment.
* Makerules (CPPFLAGS-nonlib): Remove NOT_IN_libc.
* elf/Makefile (CPPFLAGS-sotruss-lib): Likewise.
(CFLAGS-interp.c): Likewise.
(CFLAGS-ldconfig.c): Likewise.
(CPPFLAGS-.os): Likewise.
* elf/rtld-Rules (rtld-CPPFLAGS): Likewise.
* extra-lib.mk (CPPFLAGS-$(lib)): Likewise.
* extra-modules.mk (extra-modules.mk): Likewise.
* iconv/Makefile (CPPFLAGS-iconvprogs): Likewise.
* locale/Makefile (CPPFLAGS-locale_programs): Likewise.
* malloc/Makefile (CPPFLAGS-memusagestat): Likewise.
* nscd/Makefile (CPPFLAGS-nscd): Likewise.
* nss/Makefile (CPPFLAGS-nss_test1): Likewise.
* stdlib/Makefile (CFLAGS-tst-putenvmod.c): Likewise.
* sysdeps/gnu/Makefile ($(objpfx)errlist-compat.c): Likewise.
* sysdeps/unix/sysv/linux/Makefile (CPPFLAGS-lddlibc4): Likewise.
* iconvdata/Makefile (CPPFLAGS): Likewise.
(cpp-srcs-left): Add libof for all iconvdata routines.
* bits/stdio-lock.h: Replace NOT_IN_libc with IS_IN.
* include/assert.h: Likewise.
* include/ctype.h: Likewise.
* include/errno.h: Likewise.
* include/libc-symbols.h: Likewise.
* include/math.h: Likewise.
* include/netdb.h: Likewise.
* include/resolv.h: Likewise.
* include/stdio.h: Likewise.
* include/stdlib.h: Likewise.
* include/string.h: Likewise.
* include/sys/stat.h: Likewise.
* include/wctype.h: Likewise.
* intl/l10nflist.c: Likewise.
* libidn/idn-stub.c: Likewise.
* libio/libioP.h: Likewise.
* nptl/libc_multiple_threads.c: Likewise.
* nptl/pthreadP.h: Likewise.
* posix/regex_internal.h: Likewise.
* resolv/res_hconf.c: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy.S: Likewise.
* sysdeps/arm/memmove.S: Likewise.
* sysdeps/arm/sysdep.h: Likewise.
* sysdeps/generic/_itoa.h: Likewise.
* sysdeps/generic/symbol-hacks.h: Likewise.
* sysdeps/gnu/errlist.awk: Likewise.
* sysdeps/gnu/errlist.c: Likewise.
* sysdeps/i386/i586/memcpy.S: Likewise.
* sysdeps/i386/i586/memset.S: Likewise.
* sysdeps/i386/i686/memcpy.S: Likewise.
* sysdeps/i386/i686/memmove.S: Likewise.
* sysdeps/i386/i686/mempcpy.S: Likewise.
* sysdeps/i386/i686/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/bcopy.S: Likewise.
* sysdeps/i386/i686/multiarch/bzero.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2-rep.S: Likewise.
* sysdeps/i386/i686/multiarch/memset-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/memset.S: Likewise.
* sysdeps/i386/i686/multiarch/memset_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/rawmemchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcat.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-sse4.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcmp.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/strcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/strcspn.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strnlen.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2-bsf.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/strrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/strspn.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcschr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcschr.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscmp.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/i386/i686/multiarch/wcscpy.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcslen-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcslen.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr-sse2.S: Likewise.
* sysdeps/i386/i686/multiarch/wcsrchr.S: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/i386/i686/multiarch/wmemcmp.S: Likewise.
* sysdeps/ia64/fpu/libm-symbols.h: Likewise.
* sysdeps/nptl/bits/libc-lock.h: Likewise.
* sysdeps/nptl/bits/libc-lockP.h: Likewise.
* sysdeps/nptl/bits/stdio-lock.h: Likewise.
* sysdeps/posix/closedir.c: Likewise.
* sysdeps/posix/opendir.c: Likewise.
* sysdeps/posix/readdir.c: Likewise.
* sysdeps/posix/rewinddir.c: Likewise.
* sysdeps/powerpc/novmx-sigjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/bzero.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchrnul.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strpbrk.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strspn.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-32/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-32/multiarch/memset.S: Likewise.
* sysdeps/s390/s390-64/multiarch/ifunc-resolve.c: Likewise.
* sysdeps/s390/s390-64/multiarch/memcmp.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memcpy.S: Likewise.
* sysdeps/s390/s390-64/multiarch/memset.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset.S: Likewise.
* sysdeps/unix/alpha/sysdep.S: Likewise.
* sysdeps/unix/alpha/sysdep.h: Likewise.
* sysdeps/unix/make-syscalls.sh: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/nptl/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/i486/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/bits/m68k-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/not-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/longjmp_chk.c: Likewise.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/sh/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/brk.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep.h: Likewise.
* sysdeps/unix/sysv/linux/tile/waitpid.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h: Likewise.
* sysdeps/wordsize-32/symbol-hacks.h: Likewise.
* sysdeps/x86_64/memcpy.S: Likewise.
* sysdeps/x86_64/memmove.c: Likewise.
* sysdeps/x86_64/memset.S: Likewise.
* sysdeps/x86_64/multiarch/init-arch.h: Likewise.
* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcmp.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove.c: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcat.S: Likewise.
* sysdeps/x86_64/multiarch/strchr-sse2-no-bsf.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcspn.S: Likewise.
* sysdeps/x86_64/multiarch/strspn.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-c.c: Likewise.
* sysdeps/x86_64/multiarch/wcscpy-ssse3.S: Likewise.
* sysdeps/x86_64/multiarch/wcscpy.S: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp-c.c: Likewise.
* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
* sysdeps/x86_64/strcmp.S: Likewise.
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 2% to 10%.
|
| |
|
| |
|
| |
|
| |
|
|
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
|