about summary refs log tree commit diff
path: root/sysdeps/x86
Commit message (Collapse)AuthorAgeFilesLines
* x86: Add BMI1/BMI2 checks for ISA_V3 checkNoah Goldstein2022-07-181-1/+2
| | | | | | | | | BMI1/BMI2 are part of the ISA V3 requirements: https://en.wikipedia.org/wiki/X86-64 And defined by GCC when building with `-march=x86-64-v3` (cherry picked from commit 8da9f346cb2051844348785b8a932ec44489e0b7)
* x86: Add bounds `x86_non_temporal_threshold`Noah Goldstein2022-07-181-1/+7
| | | | | | | | | | | | | | | | | The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot. (cherry picked from commit b446822b6ae4e8149902a78cdd4a886634ad6321)
* x86: Fix misordered logic for setting `rep_movsb_stop_threshold`Noah Goldstein2022-07-181-12/+12
| | | | | | | | | | Move the setting of `rep_movsb_stop_threshold` to after the tunables have been collected so that the `rep_movsb_stop_threshold` (which is used to redirect control flow to the non_temporal case) will use any user value for `non_temporal_threshold` (set using glibc.cpu.x86_non_temporal_threshold) (cherry picked from commit 035591551400cfc810b07244a015c9411e8bff7c)
* x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ))H.J. Lu2022-05-161-1/+2
| | | | (cherry picked from commit 1283948f236f209b7d3f44b69a42b96806fa6da0)
* x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896]Noah Goldstein2022-05-051-0/+15
| | | | | | | | | | | | | | | | | | | | | | Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not __wcscmp_avx2. commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which can cause spurious aborts. This change will need to be backported. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 9fef7039a7d04947bc89296ee0d187bc8d89b772)
* x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNIH.J. Lu2022-05-021-2/+5
| | | | | | | | Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used. (cherry picked from commit ceeffe968c01b1202e482f4855cb6baf5c6cb713)
* x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.hNoah Goldstein2022-05-022-14/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 475b63702ef38b69558fc3d31a0b66776a70f1d3)
* x86-64: Remove Prefer_AVX2_STRCMPH.J. Lu2022-05-023-11/+0
| | | | | | | | | | | Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. (cherry picked from commit 14dbbf46a007ae5df36646b51ad0c9e5f5259f30)
* x86: Modify ENTRY in sysdep.h so that p2align can be specifiedNoah Goldstein2022-05-021-2/+5
| | | | | | | | | | | | | No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit fc5bd179ef3a953dff8d1655bd530d0e230ffe71)
* x86-64: Add Avoid_Short_Distance_REP_MOVSBH.J. Lu2022-05-024-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Jan 25 14:19:40 2020 -0800 x86-64: Avoid rep movsb with short distance [BZ #27130] introduced some regressions on Intel processors without Fast Short REP MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with short distance only on Intel processors with FSRM. bench-memmove-large on Skylake server shows that cycles of __memmove_evex_unaligned_erms improves for the following data size: before after Improvement length=4127, align1=3, align2=0: 479.38 349.25 27% length=4223, align1=9, align2=5: 405.62 333.25 18% length=8223, align1=3, align2=0: 786.12 496.38 37% length=8319, align1=9, align2=5: 727.50 501.38 31% length=16415, align1=3, align2=0: 1436.88 840.00 41% length=16511, align1=9, align2=5: 1375.50 836.38 39% length=32799, align1=3, align2=0: 2890.00 1860.12 36% length=32895, align1=9, align2=5: 2891.38 1931.88 33% (cherry picked from commit 91cc803d27bda34919717b496b53cf279e44a922)
* x86: Set rep_movsb_threshold to 2112 on processors with FSRMH.J. Lu2022-05-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed that REP MOVSB became faster after 2112 bytes: Vector Move REP MOVSB length=2112, align1=0, align2=0: 24.20 24.40 length=2112, align1=1, align2=0: 26.07 23.13 length=2112, align1=0, align2=1: 27.18 28.13 length=2112, align1=1, align2=1: 26.23 25.16 length=2176, align1=0, align2=0: 23.18 22.52 length=2176, align1=2, align2=0: 25.45 22.52 length=2176, align1=0, align2=2: 27.14 27.82 length=2176, align1=2, align2=2: 22.73 25.56 length=2240, align1=0, align2=0: 24.62 24.25 length=2240, align1=3, align2=0: 29.77 27.15 length=2240, align1=0, align2=3: 35.55 29.93 length=2240, align1=3, align2=3: 34.49 25.15 length=2304, align1=0, align2=0: 34.75 26.64 length=2304, align1=4, align2=0: 32.09 22.63 length=2304, align1=0, align2=4: 28.43 31.24 Use REP MOVSB for data size > 2112 bytes in memcpy on processors with fast short REP MOVSB (FSRM). * sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set rep_movsb_threshold to 2112 on processors with fast short REP MOVSB (FSRM). (cherry picked from commit cf2c57526ba4b57e6863ad4db8a868e2678adce8)
* x86: Adding an upper bound for Enhanced REP MOVSB.Sajan Karumanchi2022-05-023-1/+20
| | | | | | | | | | | | | | | | In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com> (cherry picked from commit 6e02b3e9327b7dbb063958d2b124b64fcb4bbe3f)
* x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.cNoah Goldstein2022-02-181-2/+2
| | | | | | | | | Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit b98d0bbf747f39770e0caba7e984ce9f8f900330)
* x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896]Noah Goldstein2022-02-183-10/+48
| | | | | | | | | | | | | In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7835d611af0854e69a0c71e3806f8fe379282d6f)
* x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #28896]Noah Goldstein2022-02-182-2/+17
| | | | | | | | | | | | | | In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)
* <bits/platform/x86.h>: Correct x86_cpu_TBMH.J. Lu2022-02-161-1/+1
| | | | | | x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21. (cherry picked from commit ba230b6387fc0ccba60d2ff6759f7e326ba7bf3e)
* x86: Use CHECK_FEATURE_PRESENT to check HLE [BZ #27398]H.J. Lu2022-02-011-1/+1
| | | | | | | HLE is disabled on blacklisted CPUs. Use CHECK_FEATURE_PRESENT, instead of CHECK_FEATURE_ACTIVE, to check HLE. (cherry picked from commit 501246c5e2dfcc278f0ebbdb72345cdd239521c7)
* x86: Black list more Intel CPUs for TSX [BZ #27398]H.J. Lu2022-02-011-3/+31
| | | | | | | | | | | Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 1e000d3d33211d5a954300e2a69b90f93f18a1a1)
* x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]H.J. Lu2022-02-015-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033. (cherry picked from commit ea8e465a6b8d0f26c72bcbe453a854de3abf68ec)
* x86: Copy IBT and SHSTK usable only if CET is enabledH.J. Lu2022-02-011-2/+5
| | | | | | | | | IBT and SHSTK usable bits are copied from CPUID feature bits and later cleared if kernel doesn't support CET. Copy IBT and SHSTK usable only if CET is enabled so that they aren't set on CET capable processors with non-CET enabled glibc. (cherry picked from commit ea26ff03227d7cacef5de6036df57734373449b4)
* x86: Add string/memory function tests in RTM regionH.J. Lu2022-01-2712-0/+618
| | | | | | | | | | At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region. (cherry picked from commit 4bd660be40967cd69072f69ebc2ad32bfcc1f206)
* x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMPH.J. Lu2022-01-273-2/+21
| | | | | | | | | | | 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. (cherry picked from commit 1da50d4bda07f04135dca39f40e79fc9eabed1f8)
* x86: use default cache size if it cannot be determined [BZ #28784]Aurelien Jarno2022-01-171-4/+10
| | | | | | | | | | | | | | | | | | | | | | | In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can not be retrieved and the corresponding values are set to 0. Commit 2d651eb9265d ("x86: Move x86 processor cache info to cpu_features") changed the behaviour in such case by defining the __x86_shared_cache_size and __x86_data_cache_size variables to 0 instead of using the default values. This cause an issue with the i686 SSE2 optimized bzero/routine which assumes that the cache size is at least 128 bytes, and otherwise tries to zero/set the whole address space minus 128 bytes. Fix that by restoring the original code to only update __x86_shared_cache_size and __x86_data_cache_size variables if the corresponding cache sizes are not zero. Fixes bug 28784 Fixes commit 2d651eb9265d Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c242fcce06e3102ca663b2f992611d0bda4f2668)
* x86: tst-cpu-features-supports.c: Update AMX checkH.J. Lu2021-04-221-3/+3
| | | | | | | Pass "amx-bf16", "amx-int8" and "amx-tile", instead of "amx_bf16", "amx_int8" and "amx_tile", to __builtin_cpu_supports for GCC 11. (cherry picked from commit 7fc9152e831fb24091c0ceabdcecb9b07dd29dd6)
* x86: Handle _SC_LEVEL1_ICACHE_LINESIZE [BZ #27444]H.J. Lu2021-03-157-0/+79
| | | | | | | | | | | | | | | | | | commit 2d651eb9265d1366d7b9e881bfddd46db9c1ecc4 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Sep 18 07:55:14 2020 -0700 x86: Move x86 processor cache info to cpu_features missed _SC_LEVEL1_ICACHE_LINESIZE. 1. Add level1_icache_linesize to struct cpu_features. 2. Initialize level1_icache_linesize by calling handle_intel, handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE. 3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit f53ffc9b90cbd92fa5518686daf4091bdd1d4889)
* x86: Set minimum x86-64 level marker [BZ #27318]H.J. Lu2021-03-063-11/+58
| | | | | | | | | | | | | | | | | | | | | Since the full ISA set used in an ELF binary is unknown to compiler, an x86-64 ISA level marker indicates the minimum, not maximum, ISA set required to run such an ELF binary. We never guarantee a library with an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64 ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for the minimum ISA set. Since -march=sandybridge enables only some ISAs in x86-64 ISA level v3, we should set the needed ISA marker to v2. Otherwise, libc is compiled with -march=sandybridge will fail to run on Sandy Bridge: $ ./elf/ld.so ./libc.so ./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3 Set the minimum, instead of maximum, x86-64 ISA level marker should have no impact on the glibc-hwcaps directory assignment logic in ldconfig nor ld.so. (cherry picked from commit 339bf918ea4830fb35614632e96f3aab3237adce)
* x86: Add CPU-specific diagnostics to ld.so --list-diagnosticsFlorian Weimer2021-03-022-0/+118
| | | | | | | (cherry picked from commit 01a5746b6c8a44dc29d33e056b63485075a6a3cc) Adjusted to not print the rep_movsb_stop_threshold field, which does not exist in glibc 2.33.
* x86: Automate generation of PREFERRED_FEATURE_INDEX_1 bitfieldFlorian Weimer2021-03-022-34/+51
| | | | | | | Use a .def file to define the bitfield layout, so that it is possible to iterate over field members using the preprocessor. (cherry picked from commit e4933c8a92ea08eecdf3ab45e7f76c95dc3d20ac)
* x86: Use SIZE_MAX instead of (long int)-1 for tunable range valueSiddhesh Poyarekar2021-02-101-5/+5
| | | | | | | The tunable types are SIZE_T, so set the ranges to the correct maximum value, i.e. SIZE_MAX. (cherry picked from commit a1b8b06a55c1ee581d5ef860cec214b0c27a66f0)
* tunables: Simplify TUNABLE_SET interfaceSiddhesh Poyarekar2021-02-101-9/+6
| | | | | | | | | | | | | | | | | The TUNABLE_SET interface took a primitive C type argument, which resulted in inconsistent type conversions internally due to incorrect dereferencing of types, especialy on 32-bit architectures. This change simplifies the TUNABLE setting logic along with the interfaces. Now all numeric tunable values are stored as signed numbers in tunable_num_t, which is intmax_t. All calls to set tunables cast the input value to its primitive type and then to tunable_num_t for storage. This relies on gcc-specific (although I suspect other compilers woul also do the same) unsigned to signed integer conversion semantics, i.e. the bit pattern is conserved. The reverse conversion is guaranteed by the standard. (cherry picked from commit 61117bfa1b08ca048e6512c0652c568300fedf6a)
* x86: Properly set usable CET feature bits [BZ #26625]H.J. Lu2021-01-2910-13/+120
| | | | | | | | | | | | | | | | | | | | | | | | commit 94cd37ebb293321115a36a422b091fdb72d2fb08 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 16 05:27:32 2020 -0700 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625] broke GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with: 1. Revert commit 94cd37ebb293321115a36a422b091fdb72d2fb08. 2. Clears the usable CET feature bits if kernel doesn't support CET. 3. Add GLIBC_TUNABLES tests without dlopen. 4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches _get_ssp. 5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled with GLIBC_TUNABLES. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Fix misplaced constAndreas Schwab2021-01-252-2/+2
| | | | | Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer target types.
* x86: Properly match CPU features in /proc/cpuinfo [BZ #27222]H.J. Lu2021-01-221-13/+30
| | | | | | | | | | | | | | Search " YYY " and " YYY\n", instead of "YYY", to avoid matching "XXXYYYZZZ" with "YYY". Update /proc/cpuinfo CPU feature names: /proc/cpuinfo glibc ------------------------------------------------ avx512vbmi AVX512_VBMI dts DS pni SSE3 tsc_deadline_timer TSC_DEADLINE
* x86: Check ifunc resolver with CPU_FEATURE_USABLE [BZ #27072]H.J. Lu2021-01-216-0/+184
| | | | | | | | Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and static executables to verify that CPUID features are initialized early in static PIE. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Use hidden visibility for early static PIE codeSzabolcs Nagy2021-01-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Extern symbol access in position independent code usually involves GOT indirection which needs RELATIVE reloc in a static linked PIE. (On some targets this is avoided e.g. because the linker can relax a GOT access to a pc-relative access, but this is not generally true.) Code that runs before static PIE self relocation must avoid relying on dynamic relocations which can be ensured by using hidden visibility. However we cannot just make all symbols hidden: On i386, all calls to IFUNC functions must go through PLT and calls to hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT may not be set up for local calls to hidden IFUNC functions. This patch aims to make symbol references hidden in code that is used before and by _dl_relocate_static_pie when building a static PIE libc. Note: for an object that is used in the startup code, its references and definition may not have consistent visibility: it is only forced hidden in the startup code. This is needed for fixing bug 27072. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Remove the C preprocessor magicH.J. Lu2021-01-2112-832/+1153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
* x86: Move x86 processor cache info to cpu_featuresH.J. Lu2021-01-145-412/+551
| | | | | | | | | | | 1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so. 2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS. 3. Move x86 cache info initialization to dl-cacheinfo.h and initialize x86 cache info in init_cpu_features (). 4. Put x86 cache info for libc in cacheinfo.h, which is included in libc-start.c in libc.a and is included in cacheinfo.c in libc.so. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix x86 build with --enable-tunable=noAdhemerval Zanella2021-01-141-0/+1
| | | | Checked on x86_64-linux-gnu.
* ldconfig/x86: Store ISA level in cache and aux cacheH.J. Lu2021-01-131-0/+31
| | | | | | | | | | | Store ISA level in the portion of the unused upper 32 bits of the hwcaps field in cache and the unused pad field in aux cache. ISA level is stored and checked only for shared objects in glibc-hwcaps subdirectories. The shared objects in the default directories aren't checked since there are no fallbacks for these shared objects. Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with --disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
* x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]H.J. Lu2021-01-133-1/+11
| | | | | Update dl_cet_check() to set header.feature_1 in TCB when both IBT and SHSTK are always on.
* x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]H.J. Lu2021-01-0717-41/+607
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$
* Drop nan-pseudo-number.h usage from testsSiddhesh Poyarekar2021-01-041-1/+0
| | | | | | | | | | Make the tests use TEST_COND_intel96 to decide on whether to build the unnormal tests instead of the macro in nan-pseudo-number.h and then drop the header inclusion. This unbreaks test runs on all architectures that do not have ldbl-96. Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used anywhere.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-0281-81/+81
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86 long double: Consider pseudo numbers as signalingSiddhesh Poyarekar2020-12-301-0/+30
| | | | | | | Add support to treat pseudo-numbers specially and implement x86 version to consider all of them as signaling. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Remove _ISOMAC check from <cpu-features.h>H.J. Lu2020-12-241-81/+75
| | | | | Remove _ISOMAC check from <cpu-features.h> since it isn't an installer header file.
* x86: Remove the duplicated CPU_FEATURE_CPU_PH.J. Lu2020-12-241-2/+0
| | | | | CPU_FEATURE_CPU_P is defined in sysdeps/x86/sys/platform/x86.h. Remove the duplicated CPU_FEATURE_CPU_P in sysdeps/x86/include/cpu-features.h.
* Partially revert 681900d29683722b1cb0a8e565a0585846ec5a61Siddhesh Poyarekar2020-12-242-12/+1
| | | | | | | | | Do not attempt to fix the significand top bit in long double input received in printf. The code should never reach here because isnan should now detect unnormals as NaN. This is already a NOP for glibc since it uses the gcc __builtin_isnan, which detects unnormals as NaN. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* x86 long double: Support pseudo numbers in isnanlSiddhesh Poyarekar2020-12-241-0/+45
| | | | | | | This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86 long double: Support pseudo numbers in fpclassifylSiddhesh Poyarekar2020-12-241-0/+46
| | | | | | | | Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Add Intel LAM supportH.J. Lu2020-12-222-0/+4
| | | | | | | | Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>. HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU. LAM modifies the checking that is applied to 64-bit linear addresses, allowing software to use of the untranslated address bits for metadata.