about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch
Commit message (Collapse)AuthorAgeFilesLines
...
* Enable AVX2 optimized memset only if -mavx2 worksH.J. Lu2014-07-144-14/+21
| | | | | | | | | | | | | | | | | * config.h.in (HAVE_AVX2_SUPPORT): New #undef. * sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and config-cflags-avx2. * sysdeps/x86_64/configure.ac: Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-avx2 only if config-cflags-avx2 is yes. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset.S: Define multiple versions only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* Add ifunc tests for x86_64 memset_chk and memsetH.J. Lu2014-06-202-1/+12
| | | | | | | | | | | | This patch adds ifunc tests for x86_64 memset_chk and memset. It also defines HAS_AVX2 with AVX2_Usable since AVX2 may not be usable even if processor has AVX2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for memset_chk and memset. * sysdeps/x86_64/multiarch/init-arch.h (HAS_AVX2): Defined with AVX2_Usable.
* Remove sysdeps/x86_64/multiarch/rtld-strlen.SH.J. Lu2014-06-201-1/+0
| | | | | | | Since there is no sysdeps/x86_64/multiarch/strlen.S, sysdeps/x86_64/rtld-strlen.S will be used. * sysdeps/x86_64/multiarch/rtld-strlen.S: Removed.
* Add x86_64 memset optimized for AVX2Ling Ma2014-06-195-1/+275
| | | | | | | | | | | | | | | In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx & avx2 instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 26% to 59%. * sysdeps/x86_64/multiarch/Makefile: Add memset-avx2. * sysdeps/x86_64/multiarch/memset-avx2.S: New file. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
* Fix -Wundef warning for FEATURE_INDEX_1.Carlos O'Donell2014-05-031-7/+6
| | | | | | | | Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros for use by both assembly and C code. This fixes the -Wundef error for cases where FEATURE_INDEX_1 was not defined but used the correct value of 0 for an undefined macro.
* Detect if AVX2 is usableSihai Yao2014-04-173-0/+12
| | | | | | | | | | | | | | | This patch checks and sets bit_AVX2_Usable in __cpu_features.feature. * sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7): New. * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Check and set bit_AVX2_Usable. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New macro. (bit_AVX2): Likewise. (index_AVX2_Usable): Likewise. (CPUID_AVX2): Likewise. (HAS_AVX2): Likewise.
* Update copyright notices with scripts/update-copyrightsAllan McRae2014-01-0139-39/+39
|
* Update file name in x86_64 ifunc listAllan McRae2013-12-161-1/+1
| | | | File name update missed in commit 584b18eb.
* Add strstr with unaligned loads. Fixes bug 12100.Ondřej Bílka2013-12-148-494/+415
| | | | | | | | | | A sse42 version of strstr used pcmpistr instruction which is quite ineffective. A faster way is look for pairs of characters which is uses sse2, is faster than pcmpistr and for real strings a pairs we look for are relatively rare. For linear time complexity we use buy or rent technique which switches to two-way algorithm when superlinear behaviour is detected.
* Use p2align instead ALIGNOndřej Bílka2013-10-086-295/+274
|
* Faster strrchr.Ondřej Bílka2013-09-265-899/+2
|
* Faster strchr implementation.Ondřej Bílka2013-09-112-128/+0
|
* Add unaligned strcmp.Ondřej Bílka2013-09-034-2/+222
|
* Fix typos.Ondřej Bílka2013-08-301-1/+1
|
* Fix rawmemchr regression on bulldozer.Ondřej Bílka2013-08-302-109/+0
|
* Fix typos.Ondřej Bílka2013-08-211-2/+2
|
* Skip SSE4.2 versions on Intel SilvermontLiubov Dmitrieva2013-06-285-15/+37
| | | | SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
* Fix buffers overrun in x86_64 memcmp-ssse3.SLiubov Dmitrieva2013-06-261-4/+2
|
* Set fast unaligned load flag for new Intel microarchitectureLiubov Dmitrieva2013-06-141-0/+7
| | | | | | | | | I have small patch for new Intel Silvermont machines. http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture I checked this on my machine and see that strcpy, ... unaligned versions are faster than ssse3 versions.
* Faster memcpy on x64.Ondrej Bilka2013-05-204-8/+185
| | | | | | | | | We add new memcpy version that uses unaligned loads which are fast on modern processors. This allows second improvement which is avoiding computed jump which is relatively expensive operation. Tests available here: http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
* Faster strlen on x64.Ondrej Bilka2013-03-1810-1179/+544
|
* Remove Prefer_SSE_for_memop on x64Ondrej Bilka2013-03-118-197/+1
|
* Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"Ondrej Bilka2013-03-0610-537/+1179
| | | | This reverts commit b79188d71716b6286866e06add976fe84100595e.
* * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementationOndrej Bilka2013-03-0610-1179/+537
| | | | | which is faster on all x86_64 architectures. Tested on AMD, Intel Nehalem, SNB, IVB.
* Remove lots of inline keywords.Roland McGrath2013-02-072-4/+5
|
* Change __x86_64 prefix in cache size to __x86H.J. Lu2013-01-053-13/+13
|
* Add HAS_RTMH.J. Lu2013-01-032-0/+16
|
* Update copyright notices with scripts/update-copyrights.Joseph Myers2013-01-0250-50/+50
|
* test-multiarch: terminate printf output with newlinePino Toscano2012-11-221-1/+1
|
* Compile x86 rtld with -mno-sse -mno-mmxH.J. Lu2012-11-021-1/+2
|
* Add x86-64 __libc_ifunc_impl_listH.J. Lu2012-10-1133-24/+380
|
* Use IFUNC memmove/memset in x86-64 bcopy/bzeroH.J. Lu2012-10-113-33/+11
| | | | Also add separate tests for bcopy and bzero.
* Define HAS_FMA with bit_FMA_UsableH.J. Lu2012-10-022-2/+10
|
* Don't define x86-64 __strncmp_ssse3 in libc.aH.J. Lu2012-09-271-4/+6
|
* Clean up x86_64/multiarch/strstr-c.c include order.Roland McGrath2012-08-151-6/+26
|
* Clean up x86_64/multiarch/memmove.c include order.Roland McGrath2012-08-151-20/+18
|
* Avoid DWARF definition DIE on ifunc symbolsH.J. Lu2012-08-092-10/+32
|
* BZ#14059: Fix AVX and FMA4 detection.Carlos O'Donell2012-05-175-30/+148
| | | | | Fix AVX and FMA4 detection by following the guidelines set out by Intel and AMD for detecting these features.
* Load pointers into RAX_LP in strcmp-sse42.SH.J. Lu2012-05-151-6/+6
|
* Load cache sizes into R*_LP in memcpy-ssse3.SH.J. Lu2012-05-151-12/+12
|
* Load cache sizes into R*_LP in memcpy-ssse3-back.SH.J. Lu2012-05-151-10/+10
|
* Load cache size into R8_LPH.J. Lu2012-05-151-4/+4
|
* Replace FSF snail mail address with URLs.Paul Eggert2012-02-0947-141/+94
|
* Really fix AVX testsUlrich Drepper2012-01-262-20/+20
| | | | | | There is no problem with strcmp, it doesn't use the YMM registers. The math routines might since gcc perhaps generates such code. Introduce bit_YMM_USBALE and use it in the math routines.
* Reset bit_AVX in __cpu_features is OS support is missingUlrich Drepper2012-01-262-2/+15
|
* Fix overrun in destination bufferLiubov Dmitrieva2011-12-232-508/+323
|
* WP fixesUlrich Drepper2011-12-171-1/+0
|
* Optimized wcschr and wcscpy for x86-64 and x86-32Ulrich Drepper2011-12-174-1/+619
|
* Fix more warningsUlrich Drepper2011-12-031-0/+4
|
* Fix test of non-ASCII locales in x86-64 strcasecmp et.al.Ulrich Drepper2011-11-011-2/+2
|