about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch/init-arch.c
Commit message (Collapse)AuthorAgeFilesLines
* Add _dl_x86_cpu_features to rtld_globalH.J. Lu2015-08-131-223/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so and initializes it early before __libc_start_main is called so that cpu_features is always available when it is used and we can avoid calling __init_cpu_features in IFUNC selectors. * sysdeps/i386/dl-machine.h: Include <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New. * sysdeps/i386/i686/cacheinfo.c (DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed. * sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch. * sysdeps/i386/i686/multiarch/Versions: Removed. * sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/i386/ldsodefs.h: Include <cpu-features.h>. * sysdeps/unix/sysv/linux/x86/Makefile (libpthread-sysdep_routines): Remove init-arch. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include <sysdeps/x86_64/dl-procinfo.c> instead of sysdeps/generic/dl-procinfo.c>. * sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers): Add cpu-features-offsets.sym and rtld-global-offsets.sym. [$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features. [$(subdir) == elf] (tests): Add tst-get-cpu-features. [$(subdir) == elf] (tests-static): Add tst-get-cpu-features-static. * sysdeps/x86/Versions: New file. * sysdeps/x86/cpu-features-offsets.sym: Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/dl-get-cpu-features.c: Likewise. * sysdeps/x86/libc-start.c: Likewise. * sysdeps/x86/rtld-global-offsets.sym: Likewise. * sysdeps/x86/tst-get-cpu-features-static.c: Likewise. * sysdeps/x86/tst-get-cpu-features.c: Likewise. * sysdeps/x86_64/dl-procinfo.c: Likewise. * sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed. Assume USE_MULTIARCH is defined and don't check it. (is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features). (is_amd): Likewise. (max_cpuid): Likewise. (intel_check_word): Likewise. (__cache_sysconf): Don't call __init_cpu_features. (__x86_preferred_memory_instruction): Removed. (init_cacheinfo): Don't call __init_cpu_features. Replace __cpu_features with GLRO(dl_x86_cpu_features). * sysdeps/x86_64/dl-machine.h: <cpu-features.c>. (dl_platform_init): Call init_cpu_features. * sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>. * sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch. * sysdeps/x86_64/multiarch/Versions: Removed. * sysdeps/x86_64/multiarch/cacheinfo.c: Likewise. * sysdeps/x86_64/multiarch/init-arch.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET): Removed. * sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
* This patch adds detection of availability for AVX512F and AVX512DQ ISAs.Andrew Senkevich2015-06-081-0/+17
| | | | | | | | * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX512F_Usable, bit_AVX512DQ_Usable, bit_Opmask_state, bit_ZMM0_15_state, bit_ZMM16_31_state): New macro. * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Check and set bit_AVX512F_Usable, bit_AVX512DQ_Usable.
* Use AVX unaligned memcpy only if AVX2 is availableH.J. Lu2015-01-301-2/+7
| | | | | | | | | | | | | | | | | | | | | | memcpy with unaligned 256-bit AVX register loads/stores are slow on older processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load and sets it only when AVX2 is available. [BZ #17801] * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Set the bit_AVX_Fast_Unaligned_Load bit for AVX2. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load): New. (index_AVX_Fast_Unaligned_Load): Likewise. (HAS_AVX_FAST_UNALIGNED_LOAD): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD. * sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
* Also treat model numbers 0x5a/0x5d as SilvermontH.J. Lu2015-01-231-0/+2
|
* Treat model numbers 0x4a/0x4d as SilvermontH.J. Lu2015-01-231-0/+2
| | | | | * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Treat model numbers 0x4a/0x4d as Intel Silvermont architecture.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2015-01-021-1/+1
|
* Detect if AVX2 is usableSihai Yao2014-04-171-0/+3
| | | | | | | | | | | | | | | This patch checks and sets bit_AVX2_Usable in __cpu_features.feature. * sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7): New. * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Check and set bit_AVX2_Usable. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New macro. (bit_AVX2): Likewise. (index_AVX2_Usable): Likewise. (CPUID_AVX2): Likewise. (HAS_AVX2): Likewise.
* Update copyright notices with scripts/update-copyrightsAllan McRae2014-01-011-1/+1
|
* Skip SSE4.2 versions on Intel SilvermontLiubov Dmitrieva2013-06-281-1/+9
| | | | SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
* Set fast unaligned load flag for new Intel microarchitectureLiubov Dmitrieva2013-06-141-0/+7
| | | | | | | | | I have small patch for new Intel Silvermont machines. http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture I checked this on my machine and see that strcpy, ... unaligned versions are faster than ssse3 versions.
* Remove Prefer_SSE_for_memop on x64Ondrej Bilka2013-03-111-11/+0
|
* Add HAS_RTMH.J. Lu2013-01-031-0/+7
|
* Update copyright notices with scripts/update-copyrights.Joseph Myers2013-01-021-1/+1
|
* Define HAS_FMA with bit_FMA_UsableH.J. Lu2012-10-021-0/+3
|
* BZ#14059: Fix AVX and FMA4 detection.Carlos O'Donell2012-05-171-10/+17
| | | | | Fix AVX and FMA4 detection by following the guidelines set out by Intel and AMD for detecting these features.
* Replace FSF snail mail address with URLs.Paul Eggert2012-02-091-3/+2
|
* Really fix AVX testsUlrich Drepper2012-01-261-7/+7
| | | | | | There is no problem with strcmp, it doesn't use the YMM registers. The math routines might since gcc perhaps generates such code. Introduce bit_YMM_USBALE and use it in the math routines.
* Reset bit_AVX in __cpu_features is OS support is missingUlrich Drepper2012-01-261-1/+13
|
* Fix compilation problems in x86-64 init-archUlrich Drepper2011-10-211-1/+2
|
* Check for FMA4 support and generate appropriate fma functionsUlrich Drepper2011-10-201-1/+9
|
* Improve 64 bit strcat functions with SSE2/SSSE3Liubov Dmitrieva2011-07-191-3/+7
|
* Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32H.J. Lu2011-06-241-3/+8
|
* Assume Intel Core i3/i5/i7 processor if AVX is availableH.J. Lu2011-06-031-0/+7
|
* Enable SSE2 memset for AMD'supcoming Orochi processor.Harsha Jagasia2011-03-041-2/+10
| | | | | | | | | This patch enables SSE2 memset for AMD's upcoming Orochi processor. This patch also fixes the following bug: For misaligned blocks larger than > 144 Bytes, memset branches into the integer code path depending on the value of misalignment even if the startup code chooses the SSE2 code path upfront, when multiarch is enabled.
* Support Intel processor model 6 and model 0x2.H.J. Lu2010-11-121-0/+1
|
* Use IFUNC on x86-64 memsetH.J. Lu2010-11-081-0/+5
|
* Unroll 32bit SSE strlen and handle slow bsfH.J. Lu2010-08-251-0/+6
|
* Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7H.J. Lu2010-06-301-3/+6
| | | | | | | This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to 4X on Core 2 and up to 2X on Core i7.
* Incorrect x86 CPU family and model check.H.J. Lu2010-05-271-3/+3
|
* Fix concurrent handling of __cpu_features.Ulrich Drepper2010-04-041-12/+21
|
* Optimize 32bit memset/memcpy with SSE2/SSSE3.H.J. Lu2010-01-121-1/+17
|
* Clean up x86 multiarch HAS_FOO macros.Roland McGrath2009-10-061-0/+1
|
* Remove ENABLE_SSSE3_ON_ATOM.H.J. Lu2009-08-281-9/+1
| | | | | It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch removes ENABLE_SSSE3_ON_ATOM.
* Support multiarch for i686.H.J. Lu2009-07-311-10/+8
| | | | | | This patch adds multiarch support when configured for i686. I modified some x86-64 functions to support 32bit. I will contribute 32bit SSE string and memory functions later.
* Prepare use if IFUNC functions outside libc.so.Ulrich Drepper2009-07-291-0/+10
| | | | | | We use a callback function into libc.so to get access to the data structure with the information and have special versions of the test macros which automatically use this function.
* Perform test for Arom x86-64 in central place and handle it.Ulrich Drepper2009-07-231-1/+7
| | | | | | | There will be more than one function which, in multiarch mode, wants to use SSSE3. We should not test in each of them for Atoms with slow SSSE3. Instead, disable the SSSE3 bit in the startup code for such machines.
* Fix little checkin problem in last patch.Ulrich Drepper2009-06-301-2/+2
|
* Determine and store processor family and model on x86-64.H.J. Lu2009-06-301-8/+29
|
* Simplify CPUID value handling.Ulrich Drepper2009-05-311-11/+7
| | | | | | | SO far Intel and AMD use exactly the same bits meaning the same things in CPUID index 1. Simplify the code. Should an architecture come along which doesn't use the same semantics then it must use a different index value than COMMON_CPUID_INDEX_1.
* * config.h.in (USE_MULTIARCH): Define.Ulrich Drepper2009-03-131-0/+65
* configure.in: Handle --enable-multi-arch. * elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC. (_dl_fixup_profile): Likewise. * elf/do-lookup.c (dl_lookup_x): Likewise. * sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC. * elf/elf.h (STT_GNU_IFUNC): Define. * include/libc-symbols.h (libc_ifunc): Define. * sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the framework in init-arch.h to get CPUID values. * sysdeps/x86_64/multiarch/Makefile: New file. * sysdeps/x86_64/multiarch/init-arch.c: New file. * sysdeps/x86_64/multiarch/init-arch.h: New file. * sysdeps/x86_64/multiarch/sched_cpucount.c: New file. * config.make.in (experimental-malloc): Define. * configure.in: Handle --enable-experimental-malloc. * malloc/Makefile: Handle experimental-malloc flag. * malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features. * malloc/arena.c: Likewise. * malloc/hooks.c: Likewise. * malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.