about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* benchtests: Add additional cases to bench-memcpy.c and bench-memmove.cNoah Goldstein2021-11-062-9/+66
| | | | | | | | | | | | This commit adds more benchmarks for the common memcpy/memmove benchmarks. The most signifcant cases are the half page offsets. The current versions leaves dst and src near page aligned which leads to false 4k aliasing on x86_64. This can add noise due to false dependencies from one run to the next. As well, this seems like more of an edge case that common case so it shouldn't be the only thing Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* string: Make tests birdirectional test-memcpy.cNoah Goldstein2021-11-062-28/+214
| | | | | | | | This commit updates the memcpy tests to test both dst > src and dst < src. This is because there is logic in the code based on the Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* Remove the last trace of generate-md5 [BZ #28554]H.J. Lu2021-11-061-1/+1
| | | | | | | | | | | | generate-md5 was removed by commit d73f5331ce5370ca5a879229e3842f5de98689cd Author: Roland McGrath <roland@gnu.org> Date: Fri May 2 02:20:45 2003 +0000 2003-05-01 Roland McGrath <roland@redhat.com> Remove its last trace. This fixes BZ #28554.
* Revert "benchtests: Add acosf function to bench-math"Sunil K Pandey2021-11-052-2710/+0
| | | | This reverts commit 79d0fc65395716c1d95931064c7bf37852203c66.
* Configure GCC with --enable-initfini-array [BZ #27945]H.J. Lu2021-11-051-0/+1
| | | | | | | | | | | | | | | | Starting from GCC 12, the .init_array and .fini_array sections are enabled unconditionally by commit 13a39886940331149173b25d6ebde0850668d8b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Jun 8 16:09:24 2021 -0700 Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux configure GCC with --enable-initfini-array to enable them when using GCC release branches. Fixes BZ #27945.
* elf: Earlier missing dynamic segment check in _dl_map_object_from_fdFlorian Weimer2021-11-051-10/+12
| | | | | | | | | Separated debuginfo files have PT_DYNAMIC with p_filesz == 0. We need to check for that before the _dl_map_segments call because that could attempt to write to mappings that extend beyond the end of the file, resulting in SIGBUS. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* gconv: Do not emit spurious NUL character in ISO-2022-JP-3 (bug 28524)Nikita Popov2021-11-043-9/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | Bugfix 27256 has introduced another issue: In conversion from ISO-2022-JP-3 encoding, it is possible to force iconv to emit extra NUL character on internal state reset. To do this, it is sufficient to feed iconv with escape sequence which switches active character set. The simplified check 'data->__statep->__count != ASCII_set' introduced by the aforementioned bugfix picks that case and behaves as if '\0' character has been queued thus emitting it. To eliminate this issue, these steps are taken: * Restore original condition '(data->__statep->__count & ~7) != ASCII_set'. It is necessary since bits 0-2 may contain number of buffered input characters. * Check that queued character is not NUL. Similar step is taken for main conversion loop. Bundled test case follows following logic: * Try to convert ISO-2022-JP-3 escape sequence switching active character set * Reset internal state by providing NULL as input buffer * Ensure that nothing has been converted. Signed-off-by: Nikita Popov <npv1310@gmail.com>
* [powerpc] Tighten contraints for asm constant parametersPaul A. Clarke2021-11-033-9/+9
| | | | | | | | | | | | There are a few places where only known numeric values are acceptable for `asm` parameters, yet the constraint "i" is used. "i" can include "symbolic constants whose values will be known only at assembly time or later." Use "n" instead of "i" where known numeric values are required. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
* elf: Do not run DSO sorting if tunables is not enabledAdhemerval Zanella2021-11-031-0/+2
| | | | | | Since the argorithm selection requires tunables. Checked on x86_64-linux-gnu with --enable-tunables=no.
* riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGNAdhemerval Zanella2021-11-033-0/+52
| | | | | | | | | | | It allows build both glibc and tests with lld (Since lld does not support R_RISCV_ALIGN linker relaxation). Checked with a build for riscv32-linux-gnu-rv32imafdc-ilp32d and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Fangrui Song <maskray@google.com>
* x86-64: Replace movzx with movzblFangrui Song2021-11-022-4/+4
| | | | | | | | | | | | | Clang cannot assemble movzx in the AT&T dialect mode. ../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction movzx (%rsi), %ecx ^~~~ Change movzx to movzbl, which follows the AT&T dialect and is used elsewhere in the file. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* regex: Unnest nested functions in regcomp.cFangrui Song2021-11-021-223/+241
| | | | | | | | | | | | | | | | | This refactor moves four functions out of a nested scope and converts them into static always_inline functions. collseqwc, table_size, symb_table, extra are now initialized to zero because they are passed as function arguments. On x86-64, .text is 16 byte larger likely due to the 4 stores. This is nothing compared to the amount of work that regcomp has to do looking up the collation weights, or other functions. If the non-buildable `sysdeps/generic/dl-machine.h` doesn't count, this patch removes the last `auto inline` usage from glibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Use Linux 5.15 in build-many-glibcs.pyJoseph Myers2021-11-021-1/+1
| | | | | | | This patch makes build-many-glibcs.py use Linux 5.15. Tested with build-many-glibcs.py (host-libraries, compilers and glibcs builds).
* elf: Assume disjointed .rela.dyn and .rela.plt for loaderAdhemerval Zanella2021-11-021-23/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch removes the the ELF_DURING_STARTUP optimization and assume both .rel.dyn and .rel.plt might not be subsequent. This allows some code simplification since relocation will be handled independently where it is done on bootstrap. At least on x86_64_64, I can not measure any performance implications. Running 10000 time the command LD_DEBUG=statistics ./elf/ld.so ./libc.so And filtering the "total startup time in dynamic loader" result, the geometric mean is: patched master Ryzen 7 5900x 24140 24952 i7-4510U 45957 45982 (The results do show some variation, I did not make any statistical analysis). It also allows build arm with lld, since it inserts ".ARM.exidx" between ".rel.dyn" and ".rel.plt" for the loader. Checked on x86_64-linux-gnu and arm-linux-gnueabihf. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i386: Explain why __HAVE_64B_ATOMICS has to be 0Florian Weimer2021-11-021-0/+4
|
* benchtests: Add hypotfAdhemerval Zanella2021-11-012-0/+1008
| | | | | | | Based on random input arguments. About 85% tuples have exponents of the two arguments close together (+-1 range). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* benchtests: Make hypot input randomAdhemerval Zanella2021-11-011-12/+1003
| | | | | | | | Instead of inputs based on the algorithm implementation details. About 85% tuples have exponents of the two arguments close together (+-1 range). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors supportAdhemerval Zanella2021-11-011-6/+1
| | | | | | | The lld linker does not support TLSDESC for arm. The have-arm-tls-desc is a leftover of 56583289b1 to support NaCL. Reviewed-by: Fangrui Song <maskray@google.com>
* arm: Use internal symbol for _dl_argv on _dl_start_userAdhemerval Zanella2021-11-011-1/+1
| | | | | | | | | | | | The lld does not support R_ARM_GOTOFF32 to preemptible symbol (_dl_argv has default visibility). Use the internal alias instead (one option would to use HIDDEN_JUMPTARGET, bu the macro is not defined for !__ASSEMBLER__ and I made this patch arm-specific to avoid require to check extensivelly on other architecture it this might break something). Checked on arm-linux-gnueabihf. Reviewed-by: Fangrui Song <maskray@google.com>
* x86-64: Remove Prefer_AVX2_STRCMPH.J. Lu2021-11-015-15/+2
| | | | | | | | | Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.
* x86-64: Improve EVEX strcmp with masked loadH.J. Lu2021-11-011-218/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In strcmp-evex.S, to compare 2 32-byte strings, replace VMOVU (%rdi, %rdx), %YMM0 VMOVU (%rsi, %rdx), %YMM1 /* Each bit in K0 represents a mismatch in YMM0 and YMM1. */ VPCMP $4, %YMM0, %YMM1, %k0 VPCMP $0, %YMMZERO, %YMM0, %k1 VPCMP $0, %YMMZERO, %YMM1, %k2 /* Each bit in K1 represents a NULL in YMM0 or YMM1. */ kord %k1, %k2, %k1 /* Each bit in K1 represents a NULL or a mismatch. */ kord %k0, %k1, %k1 kmovd %k1, %ecx testl %ecx, %ecx jne L(last_vector) with VMOVU (%rdi, %rdx), %YMM0 VPTESTM %YMM0, %YMM0, %k2 /* Each bit cleared in K1 represents a mismatch or a null CHAR in YMM0 and 32 bytes at (%rsi, %rdx). */ VPCMP $0, (%rsi, %rdx), %YMM0, %k1{%k2} kmovd %k1, %ecx incl %ecx jne L(last_vector) It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>
* benchtests: Add acosf function to bench-mathSunil K Pandey2021-10-292-0/+2710
| | | | | | | | | | | | | | | | | | Add acosf function to bench-math and copy acosf-inputs to benchtests. Motivation for this patch is to prepare for upcoming libmvec new functions. Float and double version of libmvec functions stays together. acosf-inputs file generated from acos-inputs file using following scaling formula: f = d * (FLT_MAX/DBL_MAX) Where d is input(double) and f is output(float). If scaled float value is duplicate in new input file, nextafterf() function used to find next float value, ensuring no duplicates. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* benchtests: Improve bench-memcpy-randomWilco Dijkstra2021-10-291-26/+28
| | | | | | | | Improve the random memcpy benchmark. Double the number of tests and increase the size of the memory region to test between 32KB and 1024KB. This improves accuracy on modern cores. Clean up formatting of the frequency array. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Disable -Waggressive-loop-optimizations warnings in tst-dynarray.cJoseph Myers2021-10-291-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My build-many-glibcs.py bot shows -Waggressive-loop-optimizations errors building the glibc testsuite for 32-bit architectures with GCC mainline, which seem to have appeared between GCC commits 4abc0c196b10251dc80d0743ba9e8ab3e56c61ed and d8edfadfc7a9795b65177a50ce44fd348858e844: In function 'dynarray_long_noscratch_resize', inlined from 'test_long_overflow' at tst-dynarray.c:489:5, inlined from 'do_test' at tst-dynarray.c:571:3: ../malloc/dynarray-skeleton.c:391:36: error: iteration 1073741823 invokes undefined behavior [-Werror=aggressive-loop-optimizations] 391 | DYNARRAY_ELEMENT_INIT (&list->u.dynarray_header.array[i]); tst-dynarray.c:39:37: note: in definition of macro 'DYNARRAY_ELEMENT_INIT' 39 | #define DYNARRAY_ELEMENT_INIT(e) (*(e) = 23) | ^ In file included from tst-dynarray.c:42: ../malloc/dynarray-skeleton.c:389:37: note: within this loop 389 | for (size_t i = old_size; i < size; ++i) | ~~^~~~~~ In function 'dynarray_long_resize', inlined from 'test_long_overflow' at tst-dynarray.c:479:5, inlined from 'do_test' at tst-dynarray.c:571:3: ../malloc/dynarray-skeleton.c:391:36: error: iteration 1073741823 invokes undefined behavior [-Werror=aggressive-loop-optimizations] 391 | DYNARRAY_ELEMENT_INIT (&list->u.dynarray_header.array[i]); tst-dynarray.c:27:37: note: in definition of macro 'DYNARRAY_ELEMENT_INIT' 27 | #define DYNARRAY_ELEMENT_INIT(e) (*(e) = 17) | ^ In file included from tst-dynarray.c:28: ../malloc/dynarray-skeleton.c:389:37: note: within this loop 389 | for (size_t i = old_size; i < size; ++i) | ~~^~~~~~ I don't know what GCC change made these errors appear, or why they only appear for 32-bit architectures. However, the warnings appear to be both true (that iteration would indeed involve undefined behavior if executed) and useless in this particular case (that iteration is never executed, because the allocation size overflows and so the allocation fails - but the check for allocation size overflow is in a separate source file and so can't be seen by the compiler when compiling this test). So use the DIAG_* macros to disable -Waggressive-loop-optimizations around the calls in question to dynarray_long_resize and dynarray_long_noscratch_resize in this test. Tested with build-many-glibcs.py (GCC mainline) for arm-linux-gnueabi, where it restores a clean testsuite build.
* Fix compiler issue with mmap_internalStafford Horne2021-10-291-0/+2
| | | | | | | | | | | | | | Compiling mmap_internal fails to compile when we use -1 for MMAP2_PAGE_UNIT on 32 bit architectures. The error is as follows: ../sysdeps/unix/sysv/linux/mmap_internal.h:30:8: error: unknown type name 'uint64_t' | 30 | static uint64_t page_unit; | | ^~~~~~~~ Fix by adding including stdint.h.
* Check if linker also support -mtls-dialect=gnu2Adhemerval Zanella2021-10-292-4/+4
| | | | | | | | | Since some linkers (for instance lld for i386) does not support it for all architectures. Checked on i686-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
* Fix LIBC_PROG_BINUTILS for -fuse-ld=lldAdhemerval Zanella2021-10-292-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | GCC does not print the correct linker when -fuse-ld=lld is used with the -print-prog-name=ld: $ gcc -v 2>&1 | tail -n 1 gcc version 11.2.0 (Ubuntu 11.2.0-7ubuntu2) $ gcc ld This is different than for gold: $ gcc -fuse-ld=gold -print-prog-name=ld ld.gold Using ld.lld as the static linker name prints the expected result. This is only required when -fuse-ld=lld is used, if lld is used as the 'ld' programs (through a symlink) LIBC_PROG_BINUTILS works as expected. Checked on x86_64-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
* elf: Disable ifuncmain{1,5,5pic,5pie} when using LLDAdhemerval Zanella2021-10-291-4/+13
| | | | | | | | | These tests takes the address of a protected symbol (foo_protected) and lld does not support copy relocations on protected data symbols. Checked on x86_64-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>
* Handle NULL input to malloc_usable_size [BZ #28506]Siddhesh Poyarekar2021-10-293-35/+25
| | | | | | | | | | Hoist the NULL check for malloc_usable_size into its entry points in malloc-debug and malloc and assume non-NULL in all callees. This fixes BZ #28506 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
* x86_64: Add memcmpeq.S to fix disable-multi-arch buildNoah Goldstein2021-10-281-0/+21
| | | | | | | | | | | | | | | | The following commit: commit cf4fd28ea453d1a9cec93939bc88b58ccef5437a Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Tue Oct 26 19:43:18 2021 -0500 Broke --disable-multi-arch build for x86_64 because x86_64/memcmpeq.S was not defined outside of multiarch and the alias for __memcmpeq in x86_64/memcmp.S was removed. This commit fixes that issue by adding x86_64/memcmpeq.S. make xcheck passes on x86_64 with and without --disable-multi-arch
* login: Add back libutil as an empty libraryStafford Horne2021-10-291-1/+3
| | | | | | There are several packages like sysvinit and buildroot that expect -lutil to work. Rather than impacting them with having to change the linker flags provide an empty libutil.a.
* riscv: Fix incorrect jal with HIDDEN_JUMPTARGETFangrui Song2021-10-282-3/+4
| | | | | | | | | | | | | | | | A non-local STV_DEFAULT defined symbol is by default preemptible in a shared object. j/jal cannot target a preemptible symbol. On other architectures, such a jump instruction either causes PLT [BZ #18822], or if short-ranged, sometimes rejected by the linker (but not by GNU ld's riscv port [ld PR/28509]). Use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead. With this patch, ld.so and libc.so can be linked with LLD if source files are compiled/assembled with -mno-relax/-Wa,-mno-relax. Acked-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.SNoah Goldstein2021-10-273-6/+304
| | | | | | | | | | | | | No bug. This commit adds new optimized __memcmpeq implementation for evex. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.
* x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.SNoah Goldstein2021-10-274-9/+308
| | | | | | | | | | | | | No bug. This commit adds new optimized __memcmpeq implementation for avx2. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.
* x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.SNoah Goldstein2021-10-271-4/+51
| | | | | | No bug. This commit does not modify any of the memcmp implementation. It just adds __memcmpeq ifdefs to skip obvious cases where computing the proper 1/-1 required by memcmp is not needed.
* x86_64: Add support for __memcmpeq using sse2, avx2, and evexNoah Goldstein2021-10-2712-9/+202
| | | | | | No bug. This commit adds support for __memcmpeq to be implemented seperately from memcmp. Support is added for versions optimized with sse2, avx2, and evex.
* Benchtests: Add benchtests for __memcmpeqNoah Goldstein2021-10-273-7/+29
| | | | | | No bug. This commit adds __memcmpeq benchmarks. The benchmarks just use the existing ones in memcmp. This will be useful for testing implementations of __memcmpeq that do not just alias memcmp.
* String: Add __memcmpeq as build targetNoah Goldstein2021-10-272-1/+25
| | | | | | No bug. This commit just adds __memcmpeq as a build target so that implementations for __memcmpeq that are not just aliases to memcmp can be supported.
* NEWS: Add item for __memcmpeqNoah Goldstein2021-10-261-0/+4
|
* String: Add tests for __memcmpeqNoah Goldstein2021-10-263-14/+45
| | | | | | | | | | No bug. This commit adds tests for the new function __memcmpeq. The new tests use the existing tests in 'test-memcmp.c' but relax the result requirement to only check for zero or non-zero returns. All string tests include test-memcmpeq are passing.
* String: Add hidden defs for __memcmpeq() to enable internal usageNoah Goldstein2021-10-2627-0/+38
| | | | | | | | No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.
* String: Add support for __memcmpeq() ABI on all targetsNoah Goldstein2021-10-2664-0/+120
| | | | | | | | | | | | | | | | | | | | | | | | | No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.
* configure: Don't check LD -v --help for LIBC_LINKER_FEATUREFangrui Song2021-10-253-66/+48
| | | | | | | | | When LIBC_LINKER_FEATURE is used to check a linker option with the equal sign, it will likely fail because the LD -v --help output may look like `-z lam-report=[none|warning|error]` while the needle is something like `-z lam-report=warning`. The LD -v --help filter doesn't save much time, so just remove it.
* elf: Make global.out depend on reldepmod4.so [BZ #28457]H.J. Lu2021-10-251-1/+1
| | | | | | The global test is linked with globalmod1.so which dlopens reldepmod4.so. Make global.out depend on reldepmod4.so. This fixes BZ #28457. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* x86: Replace sse2 instructions with avx in memcmp-evex-movbe.SNoah Goldstein2021-10-231-2/+2
| | | | | | | | | | | | | | This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* bench-math: Sort and put each bench per lineH.J. Lu2021-10-231-6/+62
| | | | | | Sort and put each math bench per line to prepare for new math benches. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* x86_64: Add missing libmvec ABI testsSunil K Pandey2021-10-2243-8/+152
| | | | | | Add vector ABI tests for cos, exp, log, pow and sin functions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* elf: Fix e6fd79f379 build with --enable-tunables=noAdhemerval Zanella2021-10-211-0/+9
| | | | | | The _dl_sort_maps_init() is not defined when tunables is not enabled. Checked on x86_64-linux-gnu.
* elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)Chung-Lin Tang2021-10-2114-42/+269
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Testing infrastructure for ld.so DSO sorting (BZ #17645)Chung-Lin Tang2021-10-2110-1/+1884
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the first of a 2-part patch set that fixes slow DSO sorting behavior in the dynamic loader, as reported in BZ #17645. In order to facilitate such a large modification to the dynamic loader, this first patch implements a testing framework for validating shared object sorting behavior, to enable comparison between old/new sorting algorithms, and any later enhancements. This testing infrastructure consists of a Python script scripts/dso-ordering-test.py' which takes in a description language, consisting of strings that describe a set of link dependency relations between DSOs, and generates testcase programs and Makefile fragments to automatically test the described situation, for example: a->b->c->d # four objects linked one after another a->[bc]->d;b->c # a depends on b and c, which both depend on d, # b depends on c (b,c linked to object a in fixed order) a->b->c;{+a;%a;-a} # a, b, c serially dependent, main program uses # dlopen/dlsym/dlclose on object a a->b->c;{}!->[abc] # a, b, c serially dependent; multiple tests generated # to test all permutations of a, b, c ordering linked # to main program (Above is just a short description of what the script can do, more documentation is in the script comments.) Two files containing several new tests, elf/dso-sort-tests-[12].def are added, including test scenarios for BZ #15311 and Redhat issue #1162810 [1]. Due to the nature of dynamic loader tests, where the sorting behavior and test output occurs before/after main(), generating testcases to use support/test-driver.c does not suffice to control meaningful timeout for ld.so. Therefore a new utility program 'support/test-run-command', based on test-driver.c/support_test_main.c has been added. This does the same testcase control, but for a program specified through a command-line rather than at the source code level. This utility is used to run the dynamic loader testcases generated by dso-ordering-test.py. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1162810 Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>