| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* config.h.in (HAVE_AVX2_SUPPORT): New #undef.
* sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and
config-cflags-avx2.
* sysdeps/x86_64/configure.ac: Likewise.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure: Likewise.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 only if config-cflags-avx2 is yes.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is
defined.
* sysdeps/x86_64/multiarch/memset.S: Define multiple versions
only if HAVE_AVX2_SUPPORT is defined.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds ifunc tests for x86_64 memset_chk and memset. It also
defines HAS_AVX2 with AVX2_Usable since AVX2 may not be usable even if
processor has AVX2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Add tests for memset_chk and memset.
* sysdeps/x86_64/multiarch/init-arch.h (HAS_AVX2): Defined
with AVX2_Usable.
|
|
|
|
|
|
|
| |
Since there is no sysdeps/x86_64/multiarch/strlen.S,
sysdeps/x86_64/rtld-strlen.S will be used.
* sysdeps/x86_64/multiarch/rtld-strlen.S: Removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch we take advantage of HSW memory bandwidth, manage to
reduce miss branch prediction by avoiding using branch instructions and
force destination to be aligned with avx & avx2 instruction.
The CPU2006 403.gcc benchmark indicates this patch improves performance
from 26% to 59%.
* sysdeps/x86_64/multiarch/Makefile: Add memset-avx2.
* sysdeps/x86_64/multiarch/memset-avx2.S: New file.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
|
|
|
|
|
|
|
|
| |
Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros
for use by both assembly and C code. This fixes the
-Wundef error for cases where FEATURE_INDEX_1 was not
defined but used the correct value of 0 for an undefined
macro.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch checks and sets bit_AVX2_Usable in __cpu_features.feature.
* sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7):
New.
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Check and set bit_AVX2_Usable.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New
macro.
(bit_AVX2): Likewise.
(index_AVX2_Usable): Likewise.
(CPUID_AVX2): Likewise.
(HAS_AVX2): Likewise.
|
| |
|
|
|
|
| |
File name update missed in commit 584b18eb.
|
|
|
|
|
|
|
|
|
|
| |
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
|
| |
|
|
|
|
|
|
|
|
|
| |
I have small patch for new Intel Silvermont machines.
http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture
I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
|
|
|
|
|
|
|
|
|
| |
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
|
| |
|
| |
|
|
|
|
| |
This reverts commit b79188d71716b6286866e06add976fe84100595e.
|
|
|
|
|
| |
which is faster on all x86_64 architectures.
Tested on AMD, Intel Nehalem, SNB, IVB.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Also add separate tests for bcopy and bzero.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
Fix AVX and FMA4 detection by following the guidelines
set out by Intel and AMD for detecting these features.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|