mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86: Optimize strlen-avx2.S	Noah Goldstein	2021-04-19	2	-214/+334
\| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Optimize strlen-evex.S	Noah Goldstein	2021-04-19	1	-264/+317
\| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S	Noah Goldstein	2021-04-19	5	-27/+74
\| \| \| \| \| \| \| \|	No bug. This commit adds optimized cased for less_vec memset case that uses the avx512vl/avx512bw mask store avoiding the excessive branches. test-memset and test-wmemset are passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	x86-64: Require BMI2 for strchr-avx2.S	H.J. Lu	2021-04-19	2	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since strchr-avx2.S updated by commit 1f745ecc2109890886b161d4791e1406fdfc29b8 Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h.
*	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex	H.J. Lu	2021-04-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since __strlen_evex and __strnlen_evex added by commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation.
*	x86: Update large memcpy case in memmove-vec-unaligned-erms.S	noah	2021-04-16	1	-73/+265
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	No Bug. This commit updates the large memcpy case (no overlap). The update is to perform memcpy on either 2 or 4 contiguous pages at once. This 1) helps to alleviate the affects of false memory aliasing when destination and source have a close 4k alignment and 2) In most cases and for most DRAM units is a modestly more efficient access pattern. These changes are a clear performance improvement for VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy, test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
*	powerpc: Add missing registers to clobbers list for syscalls [BZ #27623]	Matheus Castanho	2021-04-16	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some registers that can be clobbered by the kernel during a syscall are not listed on the clobbers list in sysdeps/unix/sysv/linux/powerpc/sysdep.h. For syscalls using sc: - XER is zeroed by the kernel on exit For syscalls using scv: - XER is zeroed by the kernel on exit - Different from the sc case, most CR fields can be clobbered (according to the ELF ABI and the Linux kernel's syscall ABI for powerpc (linux/Documentation/powerpc/syscall64-abi.rst) The same should apply to vsyscalls, which effectively execute a function call but are not currently adding these registers as clobbers either. These are likely not causing issues today, but they should be added to the clobbers list just in case things change on the kernel side in the future. Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
*	misc: syslog: Assume MSG_NOSIGNAL support (BZ #17144)	Adhemerval Zanella	2021-04-15	1	-4/+0
\| \| \| \| \| \| \| \|	MSG_NOSIGNAL was added on POSIX 2008 and Hurd seems to support it. The SIGPIPE handling also makes the implementation not thread-safe (due the sigaction usage). Checked on x86_64-linux-gnu.
*	io: Move file timestamps tests out of Linux	Adhemerval Zanella	2021-04-15	5	-229/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Now that libsupport abstract Linux possible missing support (either due FS limitation that can't handle 64 bit timestamp or architectures that do not handle values larger than unsigned 32 bit values) the tests can be turned generic. Checked on x86_64-linux-gnu and i686-linux-gnu. I also built the tests for i686-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
*	s390: Update ulps	Stefan Liebler	2021-04-15	1	-3/+3
\| \| \| \| \|	Required after 9acda61d94acc "Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]".
*	i386: Remove lazy tlsdesc relocation related code	Szabolcs Nagy	2021-04-15	3	-391/+2
\| \| \| \| \| \| \| \| \| \|	Like in commit e75711ebfa976d5468ec292282566a18b07e4d67 for x86_64, remove unused lazy tlsdesc relocation processing code: _dl_tlsdesc_resolve_abs_plus_addend _dl_tlsdesc_resolve_rel _dl_tlsdesc_resolve_rela _dl_tlsdesc_resolve_hold
*	x86_64: Remove lazy tlsdesc relocation related code	Szabolcs Nagy	2021-04-15	4	-219/+2
\| \| \| \| \|	_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for lazy tlsdesc relocation processing which is no longer supported.
*	i386: Avoid lazy relocation of tlsdesc [BZ #27137]	Szabolcs Nagy	2021-04-15	1	-42/+34
\| \| \| \| \| \| \| \| \| \| \|	Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67 for aarch64, but it fixes a different race: bug 27137. On i386 the code is a bit more complicated than on x86_64 because both rel and rela relocs are supported.
*	x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]	Szabolcs Nagy	2021-04-15	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67 for aarch64, but it fixes a different race: bug 27137. Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to relocate tlsdesc lazily, but that does not work in a BIND_NOW module due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at load time fixes this bug 27721 too.
*	ARC: Update ulps	Vineet Gupta	2021-04-14	2	-25/+29
\| \| \| \| \| \|	Needed after 43576de04afc6 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
*	Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h	Szabolcs Nagy	2021-04-14	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed headers and the prctl command macros were missing that are needed for it to be useful (PR_SET_TAGGED_ADDR_CTRL). Linux headers have the definitions since 5.4 so it's widely available, we don't need to repeat these definitions. The remaining definitions are from Linux 5.10. To build glibc with --enable-memory-tagging, Linux 5.4 headers and binutils 2.33.1 or newer is needed. Reviewed-by: DJ Delorie <dj@redhat.com>
*	linux: sysconf: Use a more explicit maximum_ARG_MAX	Adhemerval Zanella	2021-04-13	1	-1/+1
\|
*	linux: sysconf: limit _SC_MAX_ARG to 6 MiB (BZ #25305)	Michal Nazarewicz	2021-04-13	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since Linux 4.13, kernel limits the maximum command line arguments length to 6 MiB [1]. Normally the limit is still quarter of the maximum stack size but if that limit exceeds 6 MiB it's clamped down. glibc's __sysconf implementation for Linux platform is not aware of this limitation and for stack sizes of over 24 MiB it returns higher ARG_MAX than Linux will actually accept. This can be verified by executing the following application on Linux 4.13 or newer: #include <stdio.h> #include <string.h> #include <sys/resource.h> #include <sys/time.h> #include <unistd.h> int main(void) { const struct rlimit rlim = { 40 * 1024 * 1024, 40 * 1024 * 1024 }; if (setrlimit(RLIMIT_STACK, &rlim) < 0) { perror("setrlimit: RLIMIT_STACK"); return 1; } printf("ARG_MAX : %8ld\n", sysconf(_SC_ARG_MAX)); printf("63 * 100 KiB: %8ld\n", 63L * 100 * 1024); printf("6 MiB : %8ld\n", 6L * 1024 * 1024); char str[100 * 1024], argv[64], envp[1]; memset(&str, 'A', sizeof str); str[sizeof str - 1] = '\0'; for (size_t i = 0; i < sizeof argv / sizeof argv - 1; ++i) { argv[i] = str; } argv[sizeof argv / sizeof argv - 1] = envp[0] = 0; execve("/bin/true", argv, envp); perror("execve"); return 1; } On affected systems the program will report ARG_MAX as 10 MiB but despite that executing /bin/true with a bit over 6 MiB of command line arguments will fail with E2BIG error. Expected result is that ARG_MAX is reported as 6 MiB. Update the __sysconf function to clamp ARG_MAX value to 6 MiB if it would otherwise exceed it. This resolves bug #25305 which was market WONTFIX as suggested solution was to cap ARG_MAX at 128 KiB. As an aside and point of comparison, bionic (a libc implementation for Android systems) decided to resolve this issue by always returning 128 KiB ignoring any potential xargs regressions [2]. On older kernels this results in returning overly conservative value but that's a safer option than being aggressive and returning invalid value on recent systems. It's also worth noting that at this point all supported Linux releases have the 6 MiB barrier so only someone running an unsupported kernel version would get incorrectly truncated result. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> [1] See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da029c11e6b12f321f36dac8771e833b65cec962 [2] See https://android.googlesource.com/platform/bionic/+/baed51ee3a13dae4b87b11870bdf7f10bdc9efc1
*	s390: Update ulps	Adhemerval Zanella	2021-04-13	1	-1/+1
\| \| \| \| \|	Required after 43576de04afc6 "Improve the accuracy of tgamma (BZ #26983)"
*	i386: Update ulps	Adhemerval Zanella	2021-04-13	2	-4/+4
\| \| \| \| \|	Required after 43576de04afc6 "Improve the accuracy of tgamma (BZ #26983)"
*	linux: always update select timeout (BZ #27706)	Adhemerval Zanella	2021-04-12	1	-2/+2
\| \| \| \| \| \|	The timeout should be updated even on failure for time64 support. Checked on i686-linux-gnu.
*	linux: Normalize and return timeout on select (BZ #27651)	Adhemerval Zanella	2021-04-12	1	-9/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit 2433d39b697, which added time64 support to select, changed the function to use __NR_pselect6 (or __NR_pelect6_time64) on all architectures. However, on architectures where the symbol was implemented with __NR_select the kernel normalizes the passed timeout instead of return EINVAL. For instance, the input timeval { 0, 5000000 } is interpreted as { 5, 0 }. And as indicated by BZ #27651, this semantic seems to be expected and changing it results in some performance issues (most likely the program does not check the return code and keeps issuing select with unormalized tv_usec argument). To avoid a different semantic depending whether which syscall the architecture used to issue, select now always normalize the timeout input. This is a slight change for some ABIs (for instance aarch64). Checked on x86_64-linux-gnu and i686-linux-gnu.
*	arm: Fix an incorrect check in ____longjmp_chk [BZ #27709]	Szabolcs Nagy	2021-04-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An incorrect check in __longjmp_chk could fail on valid code causing FAIL: debug/tst-longjmp_chk2 The original check was altstack_sp + altstack_size - setjmp_sp > altstack_size i.e. sp at setjmp was outside of the altstack range. Here we know that longjmp is called from a signal handler on the altstack (SS_ONSTACK), and that it jumps in the wrong direction (sp decreases), so the check wants to ensure the jump goes to another stack. The check is wrong when altstack_sp == setjmp_sp which can happen when the altstack is a local buffer in the function that calls setjmp, so the patch allows == too. This fixes bug 27709. Note that the generic __longjmp_chk check seems to be different. (it checks if longjmp was on the altstack but does not check setjmp, so it would not catch incorrect longjmp use within the signal handler).
*	hurd: Export _hurd_libc_proc_init	Samuel Thibault	2021-04-12	1	-0/+1
\| \| \| \| \|	hurd's libdiskfs needs to be able to call _hurd_init + _hurd_libc_proc_init for bootstrap initialization.
*	powerpc: Update libm test ulps	Tulio Magno Quites Machado Filho	2021-04-09	1	-10/+10
\| \| \| \|	Update after commit 43576de04afc6a0896a3ecc094e1581069a0652a.
*	arm: update libm test ulps	Szabolcs Nagy	2021-04-08	1	-25/+25
\| \| \| \| \|	Updated after commits 9acda61d94acc5348c2330f2519a14d1a4a37e73 and 43576de04afc6a0896a3ecc094e1581069a0652a.
*	aarch64: update libm test ulps	Szabolcs Nagy	2021-04-08	1	-1/+1
\| \| \| \|	Update after commit 43576de04afc6a0896a3ecc094e1581069a0652a.
*	Improve the accuracy of tgamma (BZ #26983)	Paul Zimmermann	2021-04-07	2	-14/+29
\| \| \| \| \| \| \| \| \| \| \| \|	With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	Update hppa libm-test-ulps	John David Anglin	2021-04-06	1	-25/+27
\|
*	m68: Fix build after 9acda61d94ac	Adhemerval Zanella	2021-04-06	1	-1/+0
\| \| \| \|	The j0f/j1f/y0f/y1f now uses __inv_pio4.
*	aarch64: free tlsdesc data on dlclose [BZ #27403]	Szabolcs Nagy	2021-04-06	1	-0/+27
\| \| \| \| \| \| \| \|	DL_UNMAP_IS_SPECIAL and DL_UNMAP were not defined. The definitions are now copied from arm, since the same is needed on aarch64. The cleanup of tlsdesc data is handled by the custom _dl_unmap. Fixes bug 27403.
*	ia64: Update ulps	Adhemerval Zanella	2021-04-05	1	-48/+49
\| \| \| \| \| \|	Required after 9acda61d94acc "Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]" and db3f7bb558 "math: Remove slow paths from asin and acos [BZ #15267]".
*	ia64: Fix build after 9acda61d94ac	Adhemerval Zanella	2021-04-05	2	-4/+3
\| \| \| \| \|	The j0f/j1f/y0f/y1f now uses __inv_pio4 and call roundf (which turns to __roundf on ia64).
*	i386: Update ulps	Adhemerval Zanella	2021-04-05	2	-37/+37
\| \| \| \| \|	Required after 9acda61d94acc "Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]".
*	Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]	Paul Zimmermann	2021-04-02	8	-242/+1193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86-64: Fix ifdef indentation in strlen-evex.S	Sunil K Pandey	2021-04-01	1	-8/+8
\| \| \| \| \|	Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read.
*	Update Nios II libm-test-ulps.	Joseph Myers	2021-04-01	1	-6/+11
\|
*	Update arm libm-tests-ulps	Adhemerval Zanella	2021-04-01	1	-1/+3
\| \| \| \| \|	Required after db3f7bb558 "math: Remove slow paths from asin and acos [BZ #15267]".
*	x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]	H.J. Lu	2021-04-01	3	-2/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	config/i386/constraints.md in GCC has (define_constraint "e" "32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions)." (match_operand 0 "x86_64_immediate_operand")) Since movq takes a signed 32-bit immediate or a register source operand, use "er", instead of "nr"/"ir", constraint for 32-bit signed integer constant or register on movq. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	powerpc64le: Use ifunc for _Float128 functions also in libc	Andreas Schwab	2021-04-01	3	-8/+17
\| \| \| \| \| \|	This fixes missing definition of math functions in libc in a static link that are no longer built for libm after commit 4898d9712b ("Avoid adding duplicated symbols into static libraries").
*	S390: Allow "v" constraint for long double math_opt_barrier and ↵	Stefan Liebler	2021-04-01	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \|	math_force_eval with GCC 11. Starting with GCC 11, long double values can also be processed in vector registers if build with -march >= z14. Then GCC defines the __LONG_DOUBLE_VX__ macro. FYI: GCC commit "IBM Z: Introduce __LONG_DOUBLE_VX__ macro" https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f47df2af313d2ce7f9149149010a142c2237beda
*	Fix conform linknamespace tests due to gnu_dev_makedev	Stefan Liebler	2021-03-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If building on s390 / i686 with -Os, various conformance tests are failing with e.g. conform/ISO/assert.h/linknamespace.out: [initial] __assert_fail -> [libc.a(assert.o)] __dcgettext -> [libc.a(dcgettext.o)] __dcigettext -> [libc.a(dcigettext.o)] __getcwd -> [libc.a(getcwd.o)] __fstatat64 -> [libc.a(fstatat64.o)] gnu_dev_makedev The usage of gnu_dev_makedev was recently introduced by usage of the makedev makro in commit: 5b980d4809913088729982865188b754939bcd39 linux: Use statx for MIPSn64 This patch is now linking against __gnu_dev_makedev as also done in commit: 8b4a118222c7ed41bc653943b542915946dff1dd Fix -Os gnu_dev_* linknamespace, localplt issues (bug 15105, bug 19463).
*	Update sparc libm-tests-ulps	Adhemerval Zanella	2021-03-30	1	-1/+3
\| \| \| \| \|	Required after db3f7bb558 "math: Remove slow paths from asin and acos [BZ #15267]".
*	Move __isnanf128 to libc.so	Siddhesh Poyarekar	2021-03-30	18	-7/+39
\| \| \| \| \| \| \| \|	All of the isnan functions are in libc.so due to printf_fp, so move __isnanf128 there too for consistency. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br> Reviewed-by: Florian Weimer <fweimer@redhat.com>
*	fork.h: replace with register-atfork.h	Samuel Thibault	2021-03-29	6	-83/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	UNREGISTER_ATFORK is now defined for all ports in register-atfork.h, so most previous includes of fork.h actually only need register-atfork.h now, and cxa_finalize.c does not need an ifdef UNREGISTER_ATFORK any more. The nptl-specific fork generation counters can then go to pthreadP.h, and fork.h be removed. Checked on x86_64-linux-gnu and i686-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions	H.J. Lu	2021-03-29	3	-19/+42
\| \| \| \| \| \|	Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.
*	x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions	H.J. Lu	2021-03-29	4	-24/+31
\| \| \| \| \| \| \|	Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
*	x86: Add string/memory function tests in RTM region	H.J. Lu	2021-03-29	12	-0/+618
\| \| \| \| \| \| \| \|	At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region.
*	x86-64: Add AVX optimized string/memory functions for RTM	H.J. Lu	2021-03-29	52	-248/+670
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.
*	x86-64: Add memcmp family functions with 256-bit EVEX	H.J. Lu	2021-03-29	5	-4/+467
\| \| \| \| \| \| \|	Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.