about summary refs log tree commit diff
path: root/sysdeps/x86
Commit message (Collapse)AuthorAgeFilesLines
...
* <sys/platform/x86.h>: Add Intel LAM supportH.J. Lu2020-12-222-0/+4
| | | | | | | | Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>. HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU. LAM modifies the checking that is applied to 64-bit linear addresses, allowing software to use of the untranslated address bits for metadata.
* x86: Remove the default REP MOVSB threshold tunable value [BZ #27061]H.J. Lu2020-12-141-2/+4
| | | | | | | | | | | Since we can't tell if the tunable value is set by user or not: https://sourceware.org/bugzilla/show_bug.cgi?id=27069 remove the default REP MOVSB threshold tunable value so that the correct default value will be set correctly by init_cacheinfo (). Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Pass the fd to note processingSzabolcs Nagy2020-12-111-3/+3
| | | | | | | | | | | | | | To handle GNU property notes on aarch64 some segments need to be mmaped again, so the fd of the loaded ELF module is needed. When the fd is not available (kernel loaded modules), then -1 is passed. The fd is passed to both _dl_process_pt_gnu_property and _dl_process_pt_note for consistency. Target specific note processing functions are updated accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Adjust tst-cpu-features-supports.c for GCC 11H.J. Lu2020-12-041-5/+10
| | | | | | Check HAS_CPU_FEATURE instead of CPU_FEATURE_USABLE for FSGSBASE, IBT, LM, SHSTK and XSAVES since FSGSBASE requires kernel support, IBT/SHSTK/LM require OS support and XSAVES is supervisor-mode only.
* x86: Set RDRAND usable if CPU supports RDRANDH.J. Lu2020-12-041-0/+1
| | | | Set RDRAND usable if CPU supports RDRAND.
* x86: Remove UP macro. Define LOCK_PREFIX unconditionally.Florian Weimer2020-11-131-7/+1
| | | | | | | The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Restore processing of cache size tunables in init_cacheinfoFlorian Weimer2020-10-281-8/+4
| | | | | Fixes and partially reverts commit 59803e81f96b479c17f583b31eac44b5 ("x86: Optimizing memcpy for AMD Zen architecture.").
* x86: Optimizing memcpy for AMD Zen architecture.Sajan Karumanchi2020-10-281-6/+26
| | | | | | | | | | | | | | Modifying the shareable cache '__x86_shared_cache_size', which is a factor in computing the non-temporal threshold parameter '__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen architectures. In the existing implementation, the shareable cache is computed as 'L3 per thread, L2 per core'. Recomputing this shareable cache as 'L3 per CCX(Core-Complex)' has brought in performance gains. As per the large bench variant results, this patch also addresses the regression problem on AMD Zen architectures. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
* x86: Initialize CPU info via IFUNC relocation [BZ 26203]H.J. Lu2020-10-167-857/+943
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.
* <sys/platform/x86.h>: Add FSRCS/FSRS/FZLRM supportH.J. Lu2020-10-093-0/+18
| | | | | Add Fast Short REP CMP and SCA (FSRCS), Fast Short REP STO (FSRS) and Fast Zero-Length REP MOV (FZLRM) support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add Intel HRESET supportH.J. Lu2020-10-092-0/+4
| | | | Add Intel HRESET support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add AVX-VNNI supportH.J. Lu2020-10-093-0/+7
| | | | Add AVX-VNNI support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add AVX512_FP16 supportH.J. Lu2020-10-093-3/+7
| | | | Add AVX512_FP16 support to <sys/platform/x86.h>.
* <sys/platform/x86.h>: Add Intel UINTR supportH.J. Lu2020-10-092-3/+4
| | | | Add Intel UINTR support to <sys/platform/x86.h>.
* Reversing calculation of __x86_shared_non_temporal_thresholdPatrick McGehearty2020-09-281-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __x86_shared_non_temporal_threshold determines when memcpy on x86 uses non_temporal stores to avoid pushing other data out of the last level cache. This patch proposes to revert the calculation change made by H.J. Lu's patch of June 2, 2017. H.J. Lu's patch selected a threshold suitable for a single thread getting maximum performance. It was tuned using the single threaded large memcpy micro benchmark on an 8 core processor. The last change changes the threshold from using 3/4 of one thread's share of the cache to using 3/4 of the entire cache of a multi-threaded system before switching to non-temporal stores. Multi-threaded systems with more than a few threads are server-class and typically have many active threads. If one thread consumes 3/4 of the available cache for all threads, it will cause other active threads to have data removed from the cache. Two examples show the range of the effect. John McCalpin's widely parallel Stream benchmark, which runs in parallel and fetches data sequentially, saw a 20% slowdown with this patch on an internal system test of 128 threads. This regression was discovered when comparing OL8 performance to OL7. An example that compares normal stores to non-temporal stores may be found at https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test shows performance loss of 400 to 500% due to a failure to use nontemporal stores. These performance losses are most likely to occur when the system load is heaviest and good performance is critical. The tunable x86_non_temporal_threshold can be used to override the default for the knowledgable user who really wants maximum cache allocation to a single thread in a multi-threaded system. The manual entry for the tunable has been expanded to provide more information about its purpose. modified: sysdeps/x86/cacheinfo.c modified: manual/tunables.texi
* x86: Harden printf against non-normal long double values (bug 26649)Florian Weimer2020-09-223-0/+64
| | | | | | | | | | | | | | | | | | | | | | | The behavior of isnan/__builtin_isnan on bit patterns that do not correspond to something that the CPU would produce from valid inputs is currently under-defined in the toolchain. (The GCC built-in and glibc disagree.) The isnan check in PRINTF_FP_FETCH in stdio-common/printf_fp.c assumes the GCC behavior that returns true for non-normal numbers which are not specified as NaN. (The glibc implementation returns false for such numbers.) At present, passing non-normal numbers to __mpn_extract_long_double causes this function to produce irregularly shaped multi-precision integers, triggering undefined behavior in __printf_fp_l. With GCC 10 and glibc 2.32, this behavior is not visible because __builtin_isnan is used, which avoids calling __mpn_extract_long_double in this case. This commit updates the implementation of __mpn_extract_long_double so that regularly shaped multi-precision integers are produced in this case, avoiding undefined behavior in __printf_fp_l.
* x86: Use one ldbl2mpn.c file for both i386 and x86_64Florian Weimer2020-09-221-0/+120
|
* x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]H.J. Lu2020-09-173-6/+4
| | | | | | | | | | | | | | | | commit 04bba1e5d84b6fd8d3a3b006bc240cd5d241ee30 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Aug 5 13:51:56 2020 -0700 x86: Set CPU usable feature bits conservatively [BZ #26552] Set CPU usable feature bits only for CPU features which are usable in user space and whose usability can be detected from user space, excluding features like FSGSBASE whose enable bit can only be checked in the kernel. no longer turns on the usable bits of IBT and SHSTK since we don't know if IBT and SHSTK are usable until much later. Use HAS_CPU_FEATURE to check if the processor supports IBT and SHSTK.
* <sys/platform/x86.h>: Add Intel Key Locker supportH.J. Lu2020-09-163-3/+42
| | | | | | | | | | | | | | | | | | | | | | | Add Intel Key Locker: https://software.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html support to <sys/platform/x86.h>. Intel Key Locker has 1. KL: AES Key Locker instructions. 2. WIDE_KL: AES wide Key Locker instructions. 3. AESKLE: AES Key Locker instructions are enabled by OS. Applications should use if (CPU_FEATURE_USABLE (KL)) and if (CPU_FEATURE_USABLE (WIDE_KL)) to check if AES Key Locker instructions and AES wide Key Locker instructions are usable.
* x86: Install <sys/platform/x86.h> [BZ #26124]H.J. Lu2020-09-118-141/+654
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Install <sys/platform/x86.h> so that programmers can do #if __has_include(<sys/platform/x86.h>) #include <sys/platform/x86.h> #endif ... if (CPU_FEATURE_USABLE (SSE2)) ... if (CPU_FEATURE_USABLE (AVX2)) ... <sys/platform/x86.h> exports only: enum { COMMON_CPUID_INDEX_1 = 0, COMMON_CPUID_INDEX_7, COMMON_CPUID_INDEX_80000001, COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007, COMMON_CPUID_INDEX_80000008, COMMON_CPUID_INDEX_7_ECX_1, /* Keep the following line at the end. */ COMMON_CPUID_INDEX_MAX }; struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; }; /* Get a pointer to the CPU features structure. */ extern const struct cpu_features *__x86_get_cpu_features (unsigned int max) __attribute__ ((const)); Since all feature checks are done through macros, programs compiled with a newer <sys/platform/x86.h> are compatible with the older glibc binaries as long as the layout of struct cpu_features is identical. The features array can be expanded with backward binary compatibility for both .o and .so files. When COMMON_CPUID_INDEX_MAX is increased to support new processor features, __x86_get_cpu_features in the older glibc binaries returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the new processor feature. No new symbol version is neeeded. Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided. HAS_CPU_FEATURE can be used to identify processor features. Note: Although GCC has __builtin_cpu_supports, it only supports a subset of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE. It doesn't support HAS_CPU_FEATURE.
* x86: Set CPU usable feature bits conservatively [BZ #26552]H.J. Lu2020-09-031-96/+47
| | | | | | Set CPU usable feature bits only for CPU features which are usable in user space and whose usability can be detected from user space, excluding features like FSGSBASE whose enable bit can only be checked in the kernel.
* x86: Rename Intel CPU feature namesH.J. Lu2020-08-052-15/+15
| | | | | | | | | | | | | | | Intel64 and IA-32 Architectures Software Developer’s Manual has changed the following CPU feature names: 1. The CPU feature of Enhanced Intel SpeedStep Technology is renamed from EST to EIST. 2. The CPU feature which supports Platform Quality of Service Monitoring (PQM) capability is changed to Intel Resource Director Technology (Intel RDT) Monitoring capability, i.e. PQM is renamed to RDT_M. 3. The CPU feature which supports Platform Quality of Service Enforcement (PQE) capability is changed to Intel Resource Director Technology (Intel RDT) Allocation capability, i.e. PQE is renamed to RDT_A.
* x86: Support usable check for all CPU featuresH.J. Lu2020-07-136-439/+561
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.
* x86: Remove __ASSEMBLER__ check in init-arch.hH.J. Lu2020-07-111-5/+1
| | | | | | | | | | | | | Since commit 430388d5dc0e1861b869096f4f5d946d7d74232a Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Aug 3 08:04:49 2018 -0700 x86: Don't include <init-arch.h> in assembly codes removed all usages of <init-arch.h> from assembly codes, we can remove __ASSEMBLER__ check in init-arch.h.
* x86: Remove the unused __x86_prefetchwH.J. Lu2020-07-112-16/+4
| | | | | | | | | | | | | Since commit c867597bff2562180a18da4b8dba89d24e8b65c4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 8 13:57:50 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove removed the only usage of __x86_prefetchw, we can remove the unused __x86_prefetchw.
* rtld: Clean up PT_NOTE and add PT_GNU_PROPERTY handlingSzabolcs Nagy2020-07-081-40/+7
| | | | | | | | | | | | | | | | | | | | | | Add generic code to handle PT_GNU_PROPERTY notes. Invalid content is ignored, _dl_process_pt_gnu_property is always called after PT_LOAD segments are mapped and it has no failure modes. Currently only one NT_GNU_PROPERTY_TYPE_0 note is handled, which contains target specific properties: the _dl_process_gnu_property hook is called for each property. The old _dl_process_pt_note and _rtld_process_pt_note differ in how the program header is read. The old _dl_process_pt_note is called before PT_LOAD segments are mapped and _rtld_process_pt_note is called after PT_LOAD segments are mapped. The old _rtld_process_pt_note is removed and _dl_process_pt_note is always called after PT_LOAD segments are mapped and now it has no failure modes. The program headers are scanned backwards so that PT_NOTE can be skipped if PT_GNU_PROPERTY exists. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Add thresholds for "rep movsb/stosb" to tunablesH.J. Lu2020-07-064-0/+68
| | | | | | | | | | Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Detect Extended Feature Disable (XFD)H.J. Lu2020-07-062-0/+4
| | | | | | | An extension called extended feature disable (XFD) is an extension added for Intel AMX to the XSAVE feature set that allows an operating system to enable a feature while preventing specific user threads from using the feature.
* x86: Correct bit_cpu_CLFSH [BZ #26208]H.J. Lu2020-07-061-1/+1
| | | | bit_cpu_CLFSH should be (1u << 19), not (1u << 20).
* x86: Detect Intel Advanced Matrix ExtensionsH.J. Lu2020-06-263-0/+44
| | | | | | | | | | | | | | Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Intel AMX is an extensible architecture. New accelerators can be added and the existing accelerator may be enhanced to provide higher performance. The initial features are AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating system supports both XTILECFG state and XTILEDATA state. Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
* x86: Update CPU feature detection [BZ #26149]H.J. Lu2020-06-224-389/+267
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.
* i386: Use builtin sqrtlAdhemerval Zanella2020-06-221-0/+27
| | | | Checked on i686-linux-gnu.
* x86: Update F16C detection [BZ #26133]H.J. Lu2020-06-182-3/+7
| | | | Since F16C requires AVX, set F16C usable only when AVX is usable.
* x86: Correct bit_cpu_CLFLUSHOPT [BZ #26128]H.J. Lu2020-06-171-1/+1
| | | | bit_cpu_CLFLUSHOPT should be (1u << 23), not (1u << 22).
* x86: Update Intel Atom processor family optimizationH.J. Lu2020-05-211-1/+19
| | | | | | Enable Intel Silvermont optimization for Intel Goldmont Plus. Detect more Intel Airmont processors. Optimize Intel Tremont like Intel Silvermont with rep string instructions.
* x86: Add --enable-cet=permissiveH.J. Lu2020-05-186-43/+69
| | | | | | | | | | | | | | | | | | | | | | | | | When CET is enabled, it is an error to dlopen a non CET enabled shared library in CET enabled application. It may be desirable to make CET permissive, that is disable CET when dlopening a non CET enabled shared library. With the new --enable-cet=permissive configure option, CET is disabled when dlopening a non CET enabled shared library. Add DEFAULT_DL_X86_CET_CONTROL to config.h.in: /* The default value of x86 CET control. */ #define DEFAULT_DL_X86_CET_CONTROL cet_elf_property which enables CET features based on ELF property note. --enable-cet=permissive it to /* The default value of x86 CET control. */ #define DEFAULT_DL_X86_CET_CONTROL cet_permissive which enables CET features permissively. Update tst-cet-legacy-5a, tst-cet-legacy-5b, tst-cet-legacy-6a and tst-cet-legacy-6b to check --enable-cet and --enable-cet=permissive.
* x86: Move CET control to _dl_x86_feature_control [BZ #25887]H.J. Lu2020-05-186-66/+77
| | | | | | | | | | 1. Include <dl-procruntime.c> to get architecture specific initializer in rtld_global. 2. Change _dl_x86_feature_1[2] to _dl_x86_feature_1. 3. Add _dl_x86_feature_control after _dl_x86_feature_1, which is a struct of 2 bitfields for IBT and SHSTK control This fixes [BZ #25887].
* semaphore: consolidate arch headers into a generic oneVineet Gupta2020-05-061-40/+0
| | | | | | | | | | | | | | | | | | This consolidates the copy-pasted arch specific semaphore header into single version (based on s390) which suffices 32-bit and and 64-bit arch/ABI based on the canonical WORDSIZE. For now I've left out arches which use alternate defines to choose for 32 vs 64-bit builds (aarch64, mips) which in theory can also use the same header. Passes build-many for aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64 x86_64-linux-gnu microblaze-linux-gnu nios2-linux-gnu Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* i386: Remove unused variable in sysdeps/x86/cacheinfo.cFlorian Weimer2020-04-301-1/+1
| | | | | | | | | | | | Commit a98dc92dd1e278df4c501deb07985018bc2b06de ("x86: Add cache information support for Zhaoxin processors") introduced an unused variable warning in the default i686-linux-gnu build: In file included from ../sysdeps/i386/cacheinfo.c:3: ../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo': ../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable] 762 | unsigned int eax; | ^~~
* x86: Add the test case of __get_cpu_features support for Zhaoxin processorsmayshao-oc2020-04-301-0/+2
| | | | | For the test case of the __get_cpu_features interface, add an item in cpu_kinds and a switch case for Zhaoxin support.
* x86: Add cache information support for Zhaoxin processorsmayshao-oc2020-04-301-196/+282
| | | | | | | | | | | To obtain Zhaoxin CPU cache information, add a new function handle_zhaoxin(). Add a new function get_common_cache_info() that extracts the code in init_cacheinfo() to get the value of the variable shared, threads. Add Zhaoxin branch in init_cacheinfo() for initializing variables, such as __x86_shared_cache_size.
* x86: Add CPU Vendor ID detection support for Zhaoxin processorsmayshao2020-04-302-0/+55
| | | | | To recognize Zhaoxin CPU Vendor ID, add a new architecture type arch_kind_zhaoxin for Vendor Zhaoxin detection.
* Revert "x86_64: Add SSE sfp-exceptions"Adhemerval Zanella2020-04-201-57/+0
| | | | | | | | | | | The __sfp_handle_exceptions is not fully correct regarding raising exceptions, since there is no direct way to raise only FP_EX_OVERFLOW nor FP_EX_UNDERFLOW for SSE mode. Both libgcc and feraiseexcept rely on x87 mode to accomplish it. This reverts commit 460ee50de054396cc9791ff4cfdc2f5029fb923d. Checked on x86_64.
* x86_64: Add SSE sfp-exceptionsAdhemerval Zanella2020-04-171-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | The exported x86_64 fenv.h functions operate on both i387 and SSE (since they should work on both float, double, and long double) while the internal libc_fe* set either SSE (float, double, and float128) or i387 (long double). The libgcc __sfp_handle_exceptions (used on float128 implementation), however, will set either SEE or i387 exception depending of the exception to raise. This broke the internal assumption of float128 where only SSE operations will be used. This patch reimplements the libgcc __sfp_handle_exceptions to use only SSE operations and sets libgcc to use it instead of its own implementation. And I think we should fix libgcc in a similar manner, since checking on config/i386/64/sfp-machine.h it already only supports SSE rounding mode and x86_64 ABI also expectes float128 to use SSE registers [1] (although it is not clear on how future implementation might implement it). Checked on x86_64-linux-gnu. [1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI
* x86: Remove feraiseexcept optimizationAdhemerval Zanella2020-03-302-111/+0
| | | | | | | | | | | | | | Similar to fenvinline.h removal, this kind of optimization is better implemented by the compiler. Also newer code avoid setting exceptions directly (for instance the code to make new logf, log2f and powf implementatation to now support SVID compat). The BZ#94194 [1] the corresponding GCC bug for adding replacements for these on x86. Checked on x86_64-linux-gnu and i686-linux-gnu. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94194
* x86: Remove ARCH_CET_LEGACY_BITMAP [BZ #25397]H.J. Lu2020-03-188-183/+165
| | | | | | | | | | | Since legacy bitmap doesn't cover jitted code generated by legacy JIT engine, it isn't very useful. This patch removes ARCH_CET_LEGACY_BITMAP and treats indirect branch tracking similar to shadow stack by removing legacy bitmap support. Tested on CET Linux/x86-64 and non-CET Linux/x86-64. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Introduce <elf-initfini.h> and ELF_INITFINI for all architecturesFlorian Weimer2020-02-181-0/+20
| | | | | | | | | | | | | | | | This supersedes the init_array sysdeps directory. It allows us to check for ELF_INITFINI in both C and assembler code, and skip DT_INIT and DT_FINI processing completely on newer architectures. A new header file is needed because <dl-machine.h> is incompatible with assembler code. <sysdep.h> is compatible with assembler code, but it cannot be included in all assembler files because on some architectures, it redefines register names, and some assembler files conflict with that. <elf-initfini.h> is replicated for legacy architectures which need DT_INIT/DT_FINI support. New architectures follow the generic default and disable it.
* x86: Remove <bits/select.h> and use the generic versionFlorian Weimer2020-02-091-63/+0
| | | | | | Particularly on CPUs without ERMS, the string instructions are slow, so it is unclear whether this architecture-specific implementation is in fact an optimization.
* x86: Don't make 2 calls to dlerror () in a rowH.J. Lu2020-02-012-2/+2
| | | | | | | | | | We shouldn't make 2 calls to dlerror () in a row since the first call will clear the error. We should just use the return value from the first call. Tested on Linux/x86-64. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Add libm_alias_finite for _finite symbolsWilco Dijkstra2020-01-031-1/+2
| | | | | | | | | | | | | | | | | | | This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>