mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	nptl: Add backoff mechanism to spinlock loop	Wangyang Guo	2022-05-09	3	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	Linux: Implement a useful version of _startup_fatal	Florian Weimer	2022-05-09	3	-19/+65
\| \| \| \| \| \|	On i386 and ia64, the TCB is not available at this point. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	ia64: Always define IA64_USE_NEW_STUB as a flag macro	Florian Weimer	2022-05-09	2	-13/+15
\| \| \| \| \| \| \|	And keep the previous definition if it exists. This allows disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	linux: Fix posix_spawn return code if clone fails (BZ#29109)	Adhemerval Zanella	2022-05-06	1	-1/+1
\| \| \| \| \| \|	The __clone_internal returns the error on errno. Checked on x86_64-linux-gnu.
*	clock_adjtime: Use __nonnull to avoid null pointer	Xiaoming Ni	2022-05-05	2	-3/+3
\| \| \| \| \| \| \| \| \| \|	clock_adjtime()/clock_adjtime64() Add __nonnull((2)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	ntp_xxxtimex: Use __nonnull to avoid null pointer	Xiaoming Ni	2022-05-05	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ntp_gettime() ntp_gettime64() ntp_gettimex() ntp_gettimex64() ntp_adjtime() Add __nonnull((1)) to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	adjtimex/adjtimex64: Use __nonnull to avoid null pointer	Xiaoming Ni	2022-05-05	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	Add __nonnull((1)) to the adjtimex()/adjtimex64() function declaration to avoid null pointer access. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27662 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=29084 Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	hurd spawni: Fix reauthenticating closed fds	Samuel Thibault	2022-05-05	1	-1/+1
\| \| \| \| \|	When an fd is closed, the port cell remains, but the port becomes MACH_PORT_NULL, so we have to guard against it.
*	Linux: Define MMAP_CALL_INTERNAL	Florian Weimer	2022-05-04	3	-12/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Unlike MMAP_CALL, this avoids a TCB dependency for an errno update on failure. <mmap_internal.h> cannot be included as is on several architectures due to the definition of page_unit, so introduce a separate header file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL, <mmap_call.h>. Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
*	i386: Honor I386_USE_SYSENTER for 6-argument Linux system calls	Florian Weimer	2022-05-04	3	-3/+37
\| \| \| \| \| \| \|	Introduce an int-80h-based version of __libc_do_syscall and use it if I386_USE_SYSENTER is defined as 0. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	i386: Remove OPTIMIZE_FOR_GCC_5 from Linux libc-do-syscall.S	Florian Weimer	2022-05-04	1	-3/+0
\| \| \| \| \| \| \| \|	After commit a78e6a10d0b50d0ca80309775980fc99944b1727 ("i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771)"), it is never defined. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	powerpc32: Remove unused HAVE_PPC_SECURE_PLT	Fangrui Song	2022-05-02	2	-41/+0
\| \| \| \| \| \| \|	82a79e7d1843f9d90075a0bf2f04557040829bb0 removed the only user of HAVE_PPC_SECURE_PLT. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	benchtests: Better libmvec integration	Siddhesh Poyarekar	2022-04-29	1	-4/+0
\| \| \| \| \| \| \| \|	Improve libmvec benchmark integration so that in future other architectures may be able to run their libmvec benchmarks as well. This now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	benchtests: Add UNSUPPORTED benchmark status	Siddhesh Poyarekar	2022-04-29	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	The libmvec benchmarks print a message indicating that a certain CPU feature is unsupported and exit prematurelyi, which breaks the JSON in bench.out. Handle this more elegantly in the bench makefile target by adding support for an UNSUPPORTED exit status (77) so that bench.out continues to have output for valid tests. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	linux: Fix fchmodat with AT_SYMLINK_NOFOLLOW for 64 bit time_t (BZ#29097)	Adhemerval Zanella	2022-04-28	1	-2/+2
\| \| \| \| \| \| \| \|	The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal calls, which fails with EOVERFLOW if the file constains timestamps beyond 2038. Checked on i686-linux-gnu.
*	sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h	Noah Goldstein	2022-04-27	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	posix/glob.c: update from gnulib	DJ Delorie	2022-04-27	1	-0/+1
\| \| \| \| \| \| \| \|	Copied from gnulib/lib/glob.c in order to fix rhbz 1982608 Also fixes swbz 25659 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
*	linux: Fix missing internal 64 bit time_t stat usage	Adhemerval Zanella	2022-04-27	2	-4/+4
\| \| \| \| \| \|	These are two missing spots initially done by 52a5fe70a2c77935. Checked on i686-linux-gnu.
*	posix: Remove unused definition on _Fork	Adhemerval Zanella	2022-04-26	1	-3/+0
\| \| \| \|	Checked on x86_64-linux-gnu.
*	elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC	Fangrui Song	2022-04-26	39	-78/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	i386: Regenerate ulps	Carlos O'Donell	2022-04-26	2	-2/+2
\| \| \| \| \| \| \|	These failures were caught while building glibc master for Fedora Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse' using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon processor.
*	elf: Remove unused enum allowmask	Fangrui Song	2022-04-25	1	-11/+0
\| \| \| \| \| \| \|	Unused since 52a01100ad011293197637e42b5be1a479a2f4ae ("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]"). Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	x86: Optimize {str\|wcs}rchr-evex	Noah Goldstein	2022-04-22	1	-181/+290
\| \| \| \| \| \| \| \| \| \| \|	The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.755 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize {str\|wcs}rchr-avx2	Noah Goldstein	2022-04-22	1	-157/+269
\| \| \| \| \| \| \| \| \| \| \|	The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.832 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize {str\|wcs}rchr-sse2	Noah Goldstein	2022-04-22	4	-444/+339
\| \| \| \| \| \| \| \| \| \| \|	The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.741 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32	H.J. Lu	2022-04-22	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by 8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S 26b2478322 x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	Default to --with-default-link=no (bug 25812)	Florian Weimer	2022-04-22	1	-0/+6
\| \| \| \| \| \| \| \| \|	This is necessary to place the libio vtables into the RELRO segment. New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to verify that this is what actually happens. The new tests fail on ia64 due to lack of (default) RELRO support inbutils, so they are XFAILed there.
*	m68k: Handle fewer relocations for RTLD_BOOTSTRAP (#BZ29071)	Fangrui Song	2022-04-20	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	m68k is a non-PI_STATIC_AND_HIDDEN arch which uses a GOT relocation when loading the address of a jump table. The GOT load may be reordered before processing R_68K_RELATIVE relocations, leading to an unrelocated/incorrect jump table, which will cause a crash. The foolproof approach is to add an optimization barrier (e.g. calling an non-inlinable function after relative relocations are resolved). That is non-trivial given the current code structure, so just use the simple approach to avoid the jump table: handle only the essential reloctions for RTLD_BOOTSTRAP code. This is based on Andreas Schwab's patch and fixed ld.so crash on m68k. Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
*	x86: Fix missing __wmemcmp def for disable-multiarch build	Noah Goldstein	2022-04-19	2	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8804157ad9da39631703b92315460808eac86b0c Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Fri Apr 15 12:27:59 2022 -0500 x86: Optimize memcmp SSE2 in memcmp.S Only defined wmemcmp and missed __wmemcmp. This commit fixes that by defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp. Both multiarch and disable-multiarch builds succeed and full xchecks pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	elf: Remove __libc_init_secure	Fangrui Song	2022-04-19	4	-82/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	After 73fc4e28b9464f0e13edc719a5372839970e7ddb, __libc_enable_secure_decided is always 0 and a statically linked executable may overwrite __libc_enable_secure without considering AT_SECURE. The __libc_enable_secure has been correctly initialized in _dl_aux_init, so just remove __libc_enable_secure_decided and __libc_init_secure. This allows us to remove some startup_get*id functions from 22b79ed7f413cd980a7af0cf258da5bf82b6d5e5. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	mips: Fix mips64n32 64 bit time_t stat support (BZ#29069)	=Joshua Kinard	2022-04-18	1	-15/+23
\| \| \| \| \|	Add missing support initially added by 4e8521333bea6e89fcef1020 (which missed n32 stat).
*	x86: Cleanup page cross code in memcmp-avx2-movbe.S	Noah Goldstein	2022-04-15	1	-37/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Old code was both inefficient and wasted code size. New code (-62 bytes) and comparable or better performance in the page cross case. geometric_mean(N=20) of page cross cases New / Original: 0.960 size, align0, align1, ret, New Time/Old Time 1, 4095, 0, 0, 1.001 1, 4095, 0, 1, 0.999 1, 4095, 0, -1, 1.0 2, 4094, 0, 0, 1.0 2, 4094, 0, 1, 1.0 2, 4094, 0, -1, 1.0 3, 4093, 0, 0, 1.0 3, 4093, 0, 1, 1.0 3, 4093, 0, -1, 1.0 4, 4092, 0, 0, 0.987 4, 4092, 0, 1, 1.0 4, 4092, 0, -1, 1.0 5, 4091, 0, 0, 0.984 5, 4091, 0, 1, 1.002 5, 4091, 0, -1, 1.005 6, 4090, 0, 0, 0.993 6, 4090, 0, 1, 1.001 6, 4090, 0, -1, 1.003 7, 4089, 0, 0, 0.991 7, 4089, 0, 1, 1.0 7, 4089, 0, -1, 1.001 8, 4088, 0, 0, 0.875 8, 4088, 0, 1, 0.881 8, 4088, 0, -1, 0.888 9, 4087, 0, 0, 0.872 9, 4087, 0, 1, 0.879 9, 4087, 0, -1, 0.883 10, 4086, 0, 0, 0.878 10, 4086, 0, 1, 0.886 10, 4086, 0, -1, 0.873 11, 4085, 0, 0, 0.878 11, 4085, 0, 1, 0.881 11, 4085, 0, -1, 0.879 12, 4084, 0, 0, 0.873 12, 4084, 0, 1, 0.889 12, 4084, 0, -1, 0.875 13, 4083, 0, 0, 0.873 13, 4083, 0, 1, 0.863 13, 4083, 0, -1, 0.863 14, 4082, 0, 0, 0.838 14, 4082, 0, 1, 0.869 14, 4082, 0, -1, 0.877 15, 4081, 0, 0, 0.841 15, 4081, 0, 1, 0.869 15, 4081, 0, -1, 0.876 16, 4080, 0, 0, 0.988 16, 4080, 0, 1, 0.99 16, 4080, 0, -1, 0.989 17, 4079, 0, 0, 0.978 17, 4079, 0, 1, 0.981 17, 4079, 0, -1, 0.98 18, 4078, 0, 0, 0.981 18, 4078, 0, 1, 0.98 18, 4078, 0, -1, 0.985 19, 4077, 0, 0, 0.977 19, 4077, 0, 1, 0.979 19, 4077, 0, -1, 0.986 20, 4076, 0, 0, 0.977 20, 4076, 0, 1, 0.986 20, 4076, 0, -1, 0.984 21, 4075, 0, 0, 0.977 21, 4075, 0, 1, 0.983 21, 4075, 0, -1, 0.988 22, 4074, 0, 0, 0.983 22, 4074, 0, 1, 0.994 22, 4074, 0, -1, 0.993 23, 4073, 0, 0, 0.98 23, 4073, 0, 1, 0.992 23, 4073, 0, -1, 0.995 24, 4072, 0, 0, 0.989 24, 4072, 0, 1, 0.989 24, 4072, 0, -1, 0.991 25, 4071, 0, 0, 0.99 25, 4071, 0, 1, 0.999 25, 4071, 0, -1, 0.996 26, 4070, 0, 0, 0.993 26, 4070, 0, 1, 0.995 26, 4070, 0, -1, 0.998 27, 4069, 0, 0, 0.993 27, 4069, 0, 1, 0.999 27, 4069, 0, -1, 1.0 28, 4068, 0, 0, 0.997 28, 4068, 0, 1, 1.0 28, 4068, 0, -1, 0.999 29, 4067, 0, 0, 0.996 29, 4067, 0, 1, 0.999 29, 4067, 0, -1, 0.999 30, 4066, 0, 0, 0.991 30, 4066, 0, 1, 1.001 30, 4066, 0, -1, 0.999 31, 4065, 0, 0, 0.988 31, 4065, 0, 1, 0.998 31, 4065, 0, -1, 0.998 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Remove memcmp-sse4.S	Noah Goldstein	2022-04-15	4	-813/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code didn't actually use any sse4 instructions since `ptest` was removed in: commit 2f9062d7171850451e6044ef78d91ff8c017b9c0 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Nov 10 16:18:56 2021 -0600 x86: Shrink memcmp-sse4.S code size The new memcmp-sse2 implementation is also faster. geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905 Note there are two regressions preferring SSE2 for Size = 1 and Size = 65. Size = 1: size, align0, align1, ret, New Time/Old Time 1, 1, 1, 0, 1.2 1, 1, 1, 1, 1.197 1, 1, 1, -1, 1.2 This is intentional. Size == 1 is significantly less hot based on profiles of GCC11 and Python3 than sizes [4, 8] (which is made hotter). Python3 Size = 1 -> 13.64% Python3 Size = [4, 8] -> 60.92% GCC11 Size = 1 -> 1.29% GCC11 Size = [4, 8] -> 33.86% size, align0, align1, ret, New Time/Old Time 4, 4, 4, 0, 0.622 4, 4, 4, 1, 0.797 4, 4, 4, -1, 0.805 5, 5, 5, 0, 0.623 5, 5, 5, 1, 0.777 5, 5, 5, -1, 0.802 6, 6, 6, 0, 0.625 6, 6, 6, 1, 0.813 6, 6, 6, -1, 0.788 7, 7, 7, 0, 0.625 7, 7, 7, 1, 0.799 7, 7, 7, -1, 0.795 8, 8, 8, 0, 0.625 8, 8, 8, 1, 0.848 8, 8, 8, -1, 0.914 9, 9, 9, 0, 0.625 Size = 65: size, align0, align1, ret, New Time/Old Time 65, 0, 0, 0, 1.103 65, 0, 0, 1, 1.216 65, 0, 0, -1, 1.227 65, 65, 0, 0, 1.091 65, 0, 65, 1, 1.19 65, 65, 65, -1, 1.215 This is because A) the checks in range [65, 96] are now unrolled 2x and B) because smaller values <= 16 are now given a hotter path. By contrast the SSE4 version has a branch for Size = 80. The unrolled version has get better performance for returns which need both comparisons. size, align0, align1, ret, New Time/Old Time 128, 4, 8, 0, 0.858 128, 4, 8, 1, 0.879 128, 4, 8, -1, 0.888 As well, out of microbenchmark environments that are not full predictable the branch will have a real-cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Optimize memcmp SSE2 in memcmp.S	Noah Goldstein	2022-04-15	8	-376/+575
\| \| \| \| \| \| \| \|	New code save size (-303 bytes) and has significantly better performance. geometric_mean(N=20) of page cross cases New / Original: 0.634 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	stdio: Split __get_errname definition from errlist.c	Adhemerval Zanella	2022-04-15	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loader does not need to pull all __get_errlist definitions and its size is decreased: Before: $ size elf/ld.so text data bss dec hex filename 197774 11024 456 209254 33166 elf/ld.so After: $ size elf/ld.so text data bss dec hex filename 191510 9936 456 201902 314ae elf/ld.so Checked on x86_64-linux-gnu.
*	x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3	Noah Goldstein	2022-04-14	3	-3156/+380
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The goal is to remove most SSSE3 function as SSE4, AVX2, and EVEX are generally preferable. memcpy/memmove is one exception where avoiding unaligned loads with `palignr` is important for some targets. This commit replaces memmove-ssse3 with a better optimized are lower code footprint verion. As well it aliases memcpy to memmove. Aside from this function all other SSSE3 functions should be safe to remove. The performance is not changed drastically although shows overall improvements without any major regressions or gains. bench-memcpy geometric_mean(N=50) New / Original: 0.957 bench-memcpy-random geometric_mean(N=50) New / Original: 0.912 bench-memcpy-large geometric_mean(N=50) New / Original: 0.892 Benchmarks where run on Zhaoxin KX-6840@2000MHz See attached numbers for all results. More important this saves 7246 bytes of code size in memmove an additional 10741 bytes by reusing memmove code for memcpy (total 17987 bytes saves). As well an additional 896 bytes of rodata for the jump table entries.
*	x86: Remove mem{move\|cpy}-ssse3-back	Noah Goldstein	2022-04-14	5	-3212/+6
\| \| \| \| \| \| \|	With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Remove str{p}{n}cpy-ssse3	Noah Goldstein	2022-04-14	6	-3572/+0
\| \| \| \| \| \| \|	With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Remove str{n}cat-ssse3	Noah Goldstein	2022-04-14	5	-879/+0
\| \| \| \| \| \| \|	With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Remove str{n}{case}cmp-ssse3	Noah Goldstein	2022-04-14	10	-202/+30
\| \| \| \| \| \| \|	With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	x86: Remove {w}memcmp-ssse3	Noah Goldstein	2022-04-14	5	-2006/+0
\| \| \| \| \| \| \|	With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
*	nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029)	Adhemerval Zanella	2022-04-14	4	-4/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some Linux interfaces never restart after being interrupted by a signal handler, regardless of the use of SA_RESTART [1]. It means that for pthread cancellation, if the target thread disables cancellation with pthread_setcancelstate and calls such interfaces (like poll or select), it should not see spurious EINTR failures due the internal SIGCANCEL. However recent changes made pthread_cancel to always sent the internal signal, regardless of the target thread cancellation status or type. To fix it, the previous semantic is restored, where the cancel signal is only sent if the target thread has cancelation enabled in asynchronous mode. The cancel state and cancel type is moved back to cancelhandling and atomic operation are used to synchronize between threads. The patch essentially revert the following commits: 8c1c0aae20 nptl: Move cancel type out of cancelhandling 2b51742531 nptl: Move cancel state out of cancelhandling 26cfbb7162 nptl: Remove CANCELING_BITMASK However I changed the atomic operation to follow the internal C11 semantic and removed the MACRO usage, it simplifies a bit the resulting code (and removes another usage of the old atomic macros). Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and powerpc64-linux-gnu. [1] https://man7.org/linux/man-pages/man7/signal.7.html Reviewed-by: Florian Weimer <fweimer@redhat.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
*	S390: Add new s390 platform z16.	Stefan Liebler	2022-04-14	6	-10/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new IBM z16 is added to platform string array. The macro _DL_PLATFORMS_COUNT is incremented. _dl_hwcaps_subdir is extended by "z16" if HWCAP_S390_VXRS_PDE2 is set. HWCAP_S390_NNPA is not tested in _dl_hwcaps_subdirs_active as those instructions may be replaced or removed in future. tst-glibc-hwcaps.c is extended in order to test z16 via new marker5. A fatal glibc error is dumped if glibc was build with architecture level set for z16, but run on an older machine. (See dl-hwcap-check.h)
*	Replace {u}int_fast{16\|32} with {u}int32_t	Noah Goldstein	2022-04-13	2	-2/+2
\| \| \| \| \| \| \| \| \|	On 32-bit machines this has no affect. On 64-bit machines {u}int_fast{16\|32} are set as {u}int64_t which is often not ideal. Particularly x86_64 this change both saves code size and may save instruction cost. Full xcheck passes on x86_64.
*	hurd: Define ELIBEXEC	Samuel Thibault	2022-04-12	1	-0/+2
\| \| \| \|	So we can implement it in the exec server.
*	Remove _dl_skip_args_internal declaration	Szabolcs Nagy	2022-04-12	1	-5/+0
\| \| \| \| \| \|	It does not seem to be used. Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	manual: Avoid name collision in libm ULP table [BZ #28956]	Tom Coldrick	2022-04-11	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 32-bit and 64-bit variants of RISC-V share the same name - "RISC-V" - when generating the libm error table for the info pages. This collision, and the way how the table is generated, mean that the values in the final table for "RISC-V" may be either for the 32- or 64-bit variant, with no indication as to which. As an additional side-effect, this makes the build non-reproducible, as the error table generated is dependent upon the host filesystem implementation. To solve this issue, the libm-test-ulps-name files for both variants have been modified to include their word size, so as to remove the collision and provide more accurate information in the table. An alternative proposed was to merge the two variants' ULP values into a single file, but this would mean that information about error values is lost, as the two variants are not identical. Some differences are considerable, notably the values for the exp() function are large. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
*	powerpc: Relocate stinfo->main	Alan Modra	2022-04-10	2	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	start_addresses in sysdeps/powerpc/powerpc64/start.S is historical baggage that should disappear. Until someone does that, relocating stinfo->main by hand is one solution to the fact that the field may be unrelocated at the time it is accessed. This is similar to what is done for dynamic tags via the D_PTR macro. stinfo->init and stinfo->fini are zero in both powerpc64/start.S and powerpc32/start.S, so make it a little more obvious they are unused by passing NULLs to LIBC_START_MAIN. The makefile change is needed to pick up elf/dl-static-tls.h from dl-machine.h. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc64: Set up thread register for _dl_relocate_static_pie	Alan Modra	2022-04-10	6	-11/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libgcc ifunc resolvers that access hwcap via a field in the tcb can't be called until the thread pointer is set up. Other ifunc resolvers might need access to at_platform. This patch sets up a fake thread pointer early to a copy of tcbhead_t. hwcapinfo.c already had local variables for hwcap and at_platform, replace them with an entire tcbhead_t. It's not that large and this way we easily ensure hwcap and at_platform are at the same relative offsets as they are in the real thread block. The patch also conditionally disables part of tst-tlsifunc-static, "bar address read from IFUNC resolver is incorrect". We can't get a proper address for a thread variable before glibc initialises tls. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
*	powerpc64: Use medium model toc accesses throughout	Alan Modra	2022-04-10	6	-15/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PowerPC64 linker edits medium model toc-indirect code to toc-pointer relative: addis r9,r2,tc_entry_for_var@toc@ha ld r9,tc_entry_for_var@toc@l(r9) becomes addis r9,r2,(var-.TOC.)@ha addi r9,r9,(var-.TOC.)@l when "var" is known to be local to the binary. This isn't done for small-model toc-indirect code, because "var" is almost guaranteed to be too far away from .TOC. for a 16-bit signed offset. And, because the analysis of which .toc entry can be removed becomes much more complicated in objects that mix code models, they aren't removed if any small-model toc sequence appears in an object file. Unfortunately, glibc's build of ld.so smashes the needed objects together in a ld -r linking stage. This means the GOT/TOC is left with a whole lot of relative relocations which is untidy, but in itself is not a serious problem. However, static-pie on powerpc64 bombs due to a segfault caused by one of the small-model accesses before _dl_relocate_static_pie. (The very first one in rcrt1.o passing start_addresses in r8 to __libc_start_main.) So this patch makes all the toc/got accesses in assembly medium code model, and a couple of functions hidden. By itself this is not enough to give us working static-pie, but it is useful in isolation to enable better linker optimisation. There's a serious problem in libgcc too. libgcc ifuncs access the AT_HWCAP words stored in the tcb with an offset from the thread pointer (r13), but r13 isn't set at the time _dl_relocate_static_pie. A followup patch will fix that. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>