about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* elf: Remove lazy tlsdesc relocation related codeSzabolcs Nagy2021-04-211-1/+0
| | | | | | | | | | | Remove generic tlsdesc code related to lazy tlsdesc processing since lazy tlsdesc relocation is no longer supported. This includes removing GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at load time when that lock is already held. Added a documentation comment too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Optimize strlen-avx2.SNoah Goldstein2021-04-192-214/+334
| | | | | | | | | | No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86: Optimize strlen-evex.SNoah Goldstein2021-04-191-264/+317
| | | | | | | | | | No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.SNoah Goldstein2021-04-195-27/+74
| | | | | | | | No bug. This commit adds optimized cased for less_vec memset case that uses the avx512vl/avx512bw mask store avoiding the excessive branches. test-memset and test-wmemset are passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86-64: Require BMI2 for strchr-avx2.SH.J. Lu2021-04-192-5/+11
| | | | | | | | | | | | | | | | | Since strchr-avx2.S updated by commit 1f745ecc2109890886b161d4791e1406fdfc29b8 Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h.
* x86-64: Require BMI2 for __strlen_evex and __strnlen_evexH.J. Lu2021-04-191-2/+4
| | | | | | | | | | | | | | | | | Since __strlen_evex and __strnlen_evex added by commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation.
* x86: Update large memcpy case in memmove-vec-unaligned-erms.Snoah2021-04-161-73/+265
| | | | | | | | | | | | | | No Bug. This commit updates the large memcpy case (no overlap). The update is to perform memcpy on either 2 or 4 contiguous pages at once. This 1) helps to alleviate the affects of false memory aliasing when destination and source have a close 4k alignment and 2) In most cases and for most DRAM units is a modestly more efficient access pattern. These changes are a clear performance improvement for VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy, test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86_64: Remove lazy tlsdesc relocation related codeSzabolcs Nagy2021-04-154-219/+2
| | | | | _dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for lazy tlsdesc relocation processing which is no longer supported.
* x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]Szabolcs Nagy2021-04-151-5/+14
| | | | | | | | | | | | | Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67 for aarch64, but it fixes a different race: bug 27137. Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to relocate tlsdesc lazily, but that does not work in a BIND_NOW module due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at load time fixes this bug 27721 too.
* Improve the accuracy of tgamma (BZ #26983)Paul Zimmermann2021-04-071-3/+3
| | | | | | | | | | | | With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]Paul Zimmermann2021-04-021-38/+38
| | | | | | | | | | | | | | | | | | | | | | | For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86-64: Fix ifdef indentation in strlen-evex.SSunil K Pandey2021-04-011-8/+8
| | | | | Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read.
* x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]H.J. Lu2021-04-013-2/+74
| | | | | | | | | | | | | | | | config/i386/constraints.md in GCC has (define_constraint "e" "32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions)." (match_operand 0 "x86_64_immediate_operand")) Since movq takes a signed 32-bit immediate or a register source operand, use "er", instead of "nr"/"ir", constraint for 32-bit signed integer constant or register on movq. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functionsH.J. Lu2021-03-293-19/+42
| | | | | | Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.
* x86-64: Use ZMM16-ZMM31 in AVX512 memset family functionsH.J. Lu2021-03-294-24/+31
| | | | | | | Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
* x86-64: Add AVX optimized string/memory functions for RTMH.J. Lu2021-03-2952-248/+670
| | | | | | | | | | | | | | | | | Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.
* x86-64: Add memcmp family functions with 256-bit EVEXH.J. Lu2021-03-295-4/+467
| | | | | | | Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.
* x86-64: Add memset family functions with 256-bit EVEXH.J. Lu2021-03-296-14/+90
| | | | | | | Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
* x86-64: Add memmove family functions with 256-bit EVEXH.J. Lu2021-03-295-11/+104
| | | | | | Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.
* x86-64: Add strcpy family functions with 256-bit EVEXH.J. Lu2021-03-299-3/+1339
| | | | | | Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
* x86-64: Add ifunc-avx2.h functions with 256-bit EVEXH.J. Lu2021-03-2924-17/+2995
| | | | | | | | | | Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set.
* nptl: Remove MULTI_PAGE_ALIASING [BZ #23554]H.J. Lu2021-03-191-1/+0
| | | | | MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on Pentium 4. It is no longer needed for processors after Pentium 4.
* math: Remove mpa files [BZ #15267]Wilco Dijkstra2021-03-1118-183/+3
| | | | | | | Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* math: Remove slow paths from asin and acos [BZ #15267]Wilco Dijkstra2021-03-111-1/+3
| | | | | | | | | | | This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
* Add inputs that generate larger error boundsPaul Zimmermann2021-02-271-24/+26
| | | | (Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
* Reduce the statically linked startup code [BZ #23323]Florian Weimer2021-02-251-8/+4
| | | | | | | | | | | | | | | | | | | It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.
* x86: Use x86/nptl/pthreaddef.hH.J. Lu2021-02-221-47/+0
| | | | | | | 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>
* x86-64: Refactor and improve performance of strchr-avx2.Snoah2021-02-082-113/+113
| | | | | | | | No bug. Just seemed the performance could be improved a bit. Observed and expected behavior are unchanged. Optimized body of main loop. Updated page cross logic and optimized accordingly. Made a few minor instruction selection modifications. No regressions in test suite. Both test-strchrnul and test-strchr passed.
* x86: Adding an upper bound for Enhanced REP MOVSB.Sajan Karumanchi2021-02-021-2/+5
| | | | | | | | | | | | | | | In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
* configure: Check for static PIE supportSzabolcs Nagy2021-01-212-0/+6
| | | | | | | | | | | | | | Add SUPPORT_STATIC_PIE that targets can define if they support static PIE. This requires PI_STATIC_AND_HIDDEN support and various linker features as described in commit 9d7a3741c9e59eba87fb3ca6b9f979befce07826 Add --enable-static-pie configure option to build static PIE [BZ #19574] Currently defined on x86_64, i386 and aarch64 where static PIE is known to work. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* <sys/platform/x86.h>: Remove the C preprocessor magicH.J. Lu2021-01-212-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
* libmvec: Add extra-test-objs to test-extrasH.J. Lu2021-01-191-0/+8
| | | | | | | Add extra-test-objs to test-extras so that they are compiled with -DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Fix x86 build with --enable-tunable=noAdhemerval Zanella2021-01-141-0/+1
| | | | Checked on x86_64-linux-gnu.
* x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]H.J. Lu2021-01-072-52/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$
* Remove dbl-64/wordsize-64 (part 2)Wilco Dijkstra2021-01-071-1/+0
| | | | | | | | Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Check IFUNC definition in unrelocated executable [BZ #20019]H.J. Lu2021-01-041-5/+11
| | | | | | Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library.
* x86-64: Avoid rep movsb with short distance [BZ #27130]H.J. Lu2021-01-041-0/+21
| | | | | | | | | | | | | | | | | When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB.
* Update copyright dates with scripts/update-copyrightsPaul Eggert2021-01-02515-515/+515
| | | | | | | | | | | | | | | | I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master
* x86 long double: Support pseudo numbers in isnanlSiddhesh Poyarekar2020-12-241-1/+0
| | | | | | | This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86 long double: Support pseudo numbers in fpclassifylSiddhesh Poyarekar2020-12-241-2/+0
| | | | | | | | Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64)Paul Zimmermann2020-12-211-11/+13
|
* ieee754: Remove unused __sin32 and __cos32Anssi Hannula2020-12-184-8/+0
| | | | | The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.
* x86_64: Add glibc-hwcaps supportFlorian Weimer2020-12-043-0/+181
| | | | | | | | The subdirectories match those in the x86-64 psABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9a6b9396884b67c7c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004)Jakub Jelinek2020-12-031-1/+6
| | | | | | | | | | | The previous definition of THREAD_SELF did not tell the compiler that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK case in _dl_lookup_symbol_x. As a result, ld.so could try to use the TCB before it was initialized. As the comment in tls.h explains, asm volatile is undesirable here. Using the __seg_fs (or __seg_gs) namespace does not interfere with optimization, and expresses that THREAD_SELF is potentially trapping.
* nptl: Move stack list variables into _rtld_globalFlorian Weimer2020-11-161-2/+0
| | | | | | | | | Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Remove tls.h inclusion from internal errno.hAdhemerval Zanella2020-11-131-2/+8
| | | | | | | | | | | | The tls.h inclusion is not really required and limits possible definition on more arch specific headers. This is a cleanup to allow inline functions on sysdep.h, more specifically on i386 and ia64 which requires to access some tls definitions its own. No semantic changes expected, checked with a build against all affected ABIs.
* x86: Remove UP macro. Define LOCK_PREFIX unconditionally.Florian Weimer2020-11-132-14/+2
| | | | | | | The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* hurd: keep only required PLTs in ld.soSamuel Thibault2020-11-111-4/+0
| | | | | | | | | | | | | | | | | | | | | We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so. See Roland's comment in https://sourceware.org/bugzilla/show_bug.cgi?id=15605 "in the Hurd it's crucial that calls like __mmap be the libc ones instead of the rtld-local ones after the bootstrap phase, when the dynamic linker is being used for dlopen and the like." We used to just avoid all hidden use in the rtld ; this commit switches to keeping only those that should use PLT calls, i.e. essentially those defined in sysdeps/mach/hurd/dl-sysdep.c: __assert_fail __assert_perror_fail __*stat64 _exit This fixes a few startup issues, notably the call to __tunable_get_val that is made before PLTs are set up.
* x86: Initialize CPU info via IFUNC relocation [BZ 26203]H.J. Lu2020-10-161-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.
* aarch64: enforce >=64K guard size [BZ #26691]Szabolcs Nagy2020-10-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.