about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* Remove configure tests for SSE4 support.Joseph Myers2015-10-061-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | GCC added support for -msse4 in version 4.3. Thus the configure tests for it are obsolete, and this patch removes them. Tested for x86_64 and x86 (testsuite, and that installed stripped shared libraries are unchanged by this patch). * sysdeps/i386/configure.ac (libc_cv_cc_sse4): Remove configure test. * sysdeps/i386/configure: Regenerated. * sysdeps/i386/i686/multiarch/Makefile [$(config-cflags-sse4) = yes]: Make code unconditional. * sysdeps/i386/i686/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/i386/i686/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/x86_64/configure.ac (libc_cv_cc_sse4): Remove configure test. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/multiarch/Makefile [$(config-cflags-sse4) = yes]: Make code unconditional. * sysdeps/x86_64/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]: Likewise. * sysdeps/x86_64/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise. * config.h.in (HAVE_SSE4_SUPPORT): Remove #undef.
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
* Improve 64bit memcpy performance for Haswell CPU with AVX instructionLing Ma2014-07-301-0/+1
| | | | | | | | | In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 2% to 10%.
* Enable AVX2 optimized memset only if -mavx2 worksH.J. Lu2014-07-141-2/+5
| | | | | | | | | | | | | | | | | * config.h.in (HAVE_AVX2_SUPPORT): New #undef. * sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and config-cflags-avx2. * sysdeps/x86_64/configure.ac: Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-avx2 only if config-cflags-avx2 is yes. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset.S: Define multiple versions only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* Add x86_64 memset optimized for AVX2Ling Ma2014-06-191-1/+3
| | | | | | | | | | | | | | | In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx & avx2 instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 26% to 59%. * sysdeps/x86_64/multiarch/Makefile: Add memset-avx2. * sysdeps/x86_64/multiarch/memset-avx2.S: New file. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.
* Add strstr with unaligned loads. Fixes bug 12100.Ondřej Bílka2013-12-141-6/+3
| | | | | | | | | | A sse42 version of strstr used pcmpistr instruction which is quite ineffective. A faster way is look for pairs of characters which is uses sse2, is faster than pcmpistr and for real strings a pairs we look for are relatively rare. For linear time complexity we use buy or rent technique which switches to two-way algorithm when superlinear behaviour is detected.
* Faster strrchr.Ondřej Bílka2013-09-261-2/+2
|
* Add unaligned strcmp.Ondřej Bílka2013-09-031-2/+4
|
* Faster memcpy on x64.Ondrej Bilka2013-05-201-1/+1
| | | | | | | | | We add new memcpy version that uses unaligned loads which are fast on modern processors. This allows second improvement which is avoiding computed jump which is relatively expensive operation. Tests available here: http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
* Faster strlen on x64.Ondrej Bilka2013-03-181-4/+2
|
* Remove Prefer_SSE_for_memop on x64Ondrej Bilka2013-03-111-1/+1
|
* Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"Ondrej Bilka2013-03-061-2/+4
| | | | This reverts commit b79188d71716b6286866e06add976fe84100595e.
* * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementationOndrej Bilka2013-03-061-4/+2
| | | | | which is faster on all x86_64 architectures. Tested on AMD, Intel Nehalem, SNB, IVB.
* BZ#14059: Fix AVX and FMA4 detection.Carlos O'Donell2012-05-171-0/+1
| | | | | Fix AVX and FMA4 detection by following the guidelines set out by Intel and AMD for detecting these features.
* Optimized wcschr and wcscpy for x86-64 and x86-32Ulrich Drepper2011-12-171-1/+5
|
* Optimized strnlen and wcscmp for x86-64Liubov Dmitrieva2011-10-231-2/+2
|
* Optimized memcmp and wmemcmp for x86-64 and x86-32Liubov Dmitrieva2011-10-151-1/+2
|
* Add Atom-optimized strchr and strrchr for x86-64Liubov Dmitrieva2011-09-051-1/+2
|
* Improve 64 bit strcat functions with SSE2/SSSE3Liubov Dmitrieva2011-07-191-2/+4
|
* Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64H.J. Lu2011-06-241-2/+5
|
* Use IFUNC on x86-64 memsetH.J. Lu2010-11-081-1/+2
|
* Unroll x86-64 strlenH.J. Lu2010-08-261-1/+1
|
* Clean up warnings in new x86_64/multiarch code.Roland McGrath2010-08-251-0/+1
|
* Clean up SSE variable shiftsRichard Henderson2010-08-241-1/+1
|
* Add optimized strncasecmp versions for x86-64.Ulrich Drepper2010-08-141-1/+2
|
* Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64.Ulrich Drepper2010-07-311-1/+1
|
* Speed up SSE4.2 strcasestr by avoiding indirect function call.Ulrich Drepper2010-07-161-1/+2
|
* Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7H.J. Lu2010-06-301-1/+3
| | | | | | | This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to 4X on Core 2 and up to 2X on Core i7.
* x86-64 SSE4 optimized memcmpH.J. Lu2010-04-141-1/+1
| | | | | This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X on Intel Core i7.
* Implement SSE4.2 optimized strchr and strrchr.H.J. Lu2009-10-221-1/+2
|
* Add SSSE3-optimized implementation of str{,n}cmp for x86-64.Ulrich Drepper2009-08-071-1/+1
|
* Add SSE2 support to str{,n}cmp for x86-64.H.J. Lu2009-07-261-1/+1
|
* SSE4.2 strstr/strcasestr for x86-64.H.J. Lu2009-07-201-1/+3
| | | | | This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt string searching algorithm.
* Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.H.J. Lu2009-07-031-0/+6
|
* SSSE3 strcpy/stpcpy for x86-64H.J. Lu2009-07-021-1/+1
| | | | | | This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2 and Core i7. I disabled it on Atom since SSSE3 version is slower for shorter (<64byte) data.
* Add SSE4.2 support for strcmp and strncmp on x86-64.H.J. Lu2009-06-221-0/+4
|
* Optimize x86-64 strlen for SSE4.2.Ulrich Drepper2009-06-051-0/+1
| | | | | The SSE4.2 implementation is used in the DSO only. The patch also adds some infrastructure to be used in similar code later one.
* * config.h.in (USE_MULTIARCH): Define.Ulrich Drepper2009-03-131-0/+3
* configure.in: Handle --enable-multi-arch. * elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC. (_dl_fixup_profile): Likewise. * elf/do-lookup.c (dl_lookup_x): Likewise. * sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC. * elf/elf.h (STT_GNU_IFUNC): Define. * include/libc-symbols.h (libc_ifunc): Define. * sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the framework in init-arch.h to get CPUID values. * sysdeps/x86_64/multiarch/Makefile: New file. * sysdeps/x86_64/multiarch/init-arch.c: New file. * sysdeps/x86_64/multiarch/init-arch.h: New file. * sysdeps/x86_64/multiarch/sched_cpucount.c: New file. * config.make.in (experimental-malloc): Define. * configure.in: Handle --enable-experimental-malloc. * malloc/Makefile: Handle experimental-malloc flag. * malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features. * malloc/arena.c: Likewise. * malloc/hooks.c: Likewise. * malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.