mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c	Noah Goldstein	2022-02-18	1	-2/+2
\| \| \| \| \| \| \| \| \|	Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit b98d0bbf747f39770e0caba7e984ce9f8f900330)
*	x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896]	Noah Goldstein	2022-02-18	3	-10/+48
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 7835d611af0854e69a0c71e3806f8fe379282d6f)
*	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #28896]	Noah Goldstein	2022-02-18	7	-5/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)
*	x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064]	H.J. Lu	2022-02-01	3	-0/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 6f573a27b6c8b4236445810a44660612323f5a73 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 added wcsnlen-sse4.1 to the wcslen ifunc implementation list. Since the random value in the the RSI register is larger than the wide-character string length in the existing wcslen test, it didn't trigger the wcslen test failure. Add a test to force 0 into the RSI register before calling wcslen. (cherry picked from commit a6e7c3745d73ff876b4ba6991fb00768a938aef5)
*	x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064]	Noah Goldstein	2022-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following commit commit 6f573a27b6c8b4236445810a44660612323f5a73 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc implementation list and adding wcslen-sse4.1 to the ifunc implementation list. Testing: test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as well as all other tests in wcsmbs and string. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 0679442defedf7e52a94264975880ab8674736b2)
*	x86: Black list more Intel CPUs for TSX [BZ #27398]	H.J. Lu	2022-02-01	1	-3/+32
\| \| \| \| \| \| \| \| \| \| \|	Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 1e000d3d33211d5a954300e2a69b90f93f18a1a1)
*	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]	H.J. Lu	2022-02-01	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033. (cherry picked from commit ea8e465a6b8d0f26c72bcbe453a854de3abf68ec)
*	x86: Optimize strlen-evex.S	Noah Goldstein	2022-01-27	1	-264/+317
\| \| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 4ba65586847751372520a36757c17f114588794e)
*	x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974]	Noah Goldstein	2022-01-27	2	-38/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on maxlen * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a775a7a3eb1e85b54af0b4ee5ff4dcf66772a1fb)
*	x86-64: Add wcslen optimize for sse4.1	Noah Goldstein	2022-01-27	6	-36/+63
\| \| \| \| \| \| \| \| \| \|	No bug. This comment adds the ifunc / build infrastructure necessary for wcslen to prefer the sse4.1 implementation in strlen-vec.S. test-wcslen.c is passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 6f573a27b6c8b4236445810a44660612323f5a73)
*	x86-64: Move strlen.S to multiarch/strlen-vec.S	H.J. Lu	2022-01-27	4	-242/+262
\| \| \| \| \| \| \| \| \| \|	Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1 version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants. This also removes the unused symbols, __GI___strlen_sse2 and __GI___wcsnlen_sse4_1. (cherry picked from commit a0db678071c60b6c47c468d231dd0b3694ba7a98)
*	x86-64: Fix an unknown vector operation in memchr-evex.S	Alice Xu	2022-01-27	1	-1/+1
\| \| \| \| \| \| \| \|	An unknown vector operation occurred in commit 2a76821c308. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 6ea916adfa0ab9af6e7dc6adcf6f977dfe017835)
*	x86: Optimize memchr-evex.S	Noah Goldstein	2022-01-27	1	-225/+322
\| \| \| \| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes memchr-evex.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, saving some ALU in the alignment process, and most importantly increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 2a76821c3081d2c0231ecd2618f52662cb48fccd)
*	x86: Optimize strlen-avx2.S	Noah Goldstein	2022-01-27	2	-214/+334
\| \| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit aaa23c35071537e2dcf5807e956802ed215210aa)
*	x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974]	Noah Goldstein	2022-01-27	2	-37/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on n * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 645a158978f9520e74074e8c14047503be4db0f0)
*	x86: Optimize memchr-avx2.S	Noah Goldstein	2022-01-27	1	-178/+247
\| \| \| \| \| \| \| \| \| \| \| \|	No bug. This commit optimizes memchr-avx2.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, asaving a few instructions the in loop return loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit acfd088a1963ba51cd83c78f95c0ab25ead79e04)
*	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex	H.J. Lu	2022-01-27	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since __strlen_evex and __strnlen_evex added by commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation. (cherry picked from commit 55bf411b451c13f0fb7ff3d3bf9a820020b45df1)
*	x86-64: Fix ifdef indentation in strlen-evex.S	Sunil K Pandey	2022-01-27	1	-8/+8
\| \| \| \| \| \| \|	Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read. (cherry picked from commit 595c22ecd8e87a27fd19270ed30fdbae9ad25426)
*	x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions	H.J. Lu	2022-01-27	3	-19/+35
\| \| \| \| \| \| \| \|	Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit e4fda4631017e49d4ee5a2755db34289b6860fa4)
*	x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions	H.J. Lu	2022-01-27	4	-24/+31
\| \| \| \| \| \| \| \| \|	Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 4e2d8f352774b56078c34648b14a2412c38384f4)
*	x86: Add string/memory function tests in RTM region	H.J. Lu	2022-01-27	12	-0/+622
\| \| \| \| \| \| \| \| \| \|	At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region. (cherry picked from commit 4bd660be40967cd69072f69ebc2ad32bfcc1f206)
*	x86-64: Add AVX optimized string/memory functions for RTM	H.J. Lu	2022-01-27	48	-190/+594
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region. (cherry picked from commit 7ebba91361badf7531d4e75050627a88d424872f)
*	x86-64: Add memcmp family functions with 256-bit EVEX	H.J. Lu	2022-01-27	5	-4/+467
\| \| \| \| \| \| \| \| \|	Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit. (cherry picked from commit 91264fe3577fe887b4860923fa6142b5274c8965)
*	x86-64: Add memset family functions with 256-bit EVEX	H.J. Lu	2022-01-27	6	-14/+90
\| \| \| \| \| \| \| \| \|	Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 1b968b6b9b3aac702ac2f133e0dd16cfdbb415ee)
*	x86-64: Add memmove family functions with 256-bit EVEX	H.J. Lu	2022-01-27	5	-11/+97
\| \| \| \| \| \| \| \|	Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit 63ad43566f7a25d140dc723598aeb441ad657eed)
*	x86-64: Add strcpy family functions with 256-bit EVEX	H.J. Lu	2022-01-27	9	-0/+1338
\| \| \| \| \| \| \| \|	Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 525bc2a32c9710df40371f951217c6ae7a923aee)
*	x86-64: Add ifunc-avx2.h functions with 256-bit EVEX	H.J. Lu	2022-01-27	24	-17/+2996
\| \| \| \| \| \| \| \| \| \| \| \|	Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set. (cherry picked from commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77)
*	x86: Add AVX512VL_Usable and AVX512BW_Usable	H.J. Lu	2022-01-27	2	-0/+12
\| \| \| \| \|	Add AVX512VL_Usable and AVX512BW_Usable for backporting string/memory functions optimized with 256-bit EVEX.
*	x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP	H.J. Lu	2022-01-27	3	-2/+23
\| \| \| \| \| \| \| \| \| \| \|	1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. (cherry picked from commit 1da50d4bda07f04135dca39f40e79fc9eabed1f8)
*	x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]	Noah Goldstein	2022-01-27	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to __wcscmp_avx2. For x86_64 this covers the entire address range so any length larger could not possibly be used to bound `s1` or `s2`. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87)
*	x86: Check IFUNC definition in unrelocated executable [BZ #20019]	H.J. Lu	2021-01-13	2	-10/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library. On x86, ifuncmain6pie failed with: [hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct ./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency. [hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so \| grep foo 00003ff4 00000706 R_386_GLOB_DAT 0000400c foo_ptr 00003ff8 00000406 R_386_GLOB_DAT 00000000 foo 0000400c 00000401 R_386_32 00000000 foo [hjl@gnu-cfl-2 build-i686-linux]$ Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which trigger the circular IFUNC dependency, and build ifuncmain6pie with -Wl,-z,lazy. (cherry picked from commits 6ea5b57afa5cdc9ce367d2b69a2cebfb273e4617 and 7137d682ebfcb6db5dfc5f39724718699922f06c)
*	x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]	H.J. Lu	2021-01-13	3	-1/+12
\| \| \| \| \| \| \|	Update dl_cet_check() to set header.feature_1 in TCB when both IBT and SHSTK are always on. (cherry picked from commit 2ef23b520597f4ea1790a669b83e608f24f4cf12)
*	x86-64: Avoid rep movsb with short distance [BZ #27130]	H.J. Lu	2021-01-12	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB. (cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)
*	aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]	Szabolcs Nagy	2020-11-04	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798. (cherry picked from commit 558251bd8785760ad40fcbfeaaee5d27fa5b0fe4)
*	AArch64: Use __memcpy_simd on Neoverse N2/V1	Wilco Dijkstra	2020-10-14	3	-2/+8
\| \| \| \| \| \| \| \|	Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as the memcpy/memmove ifunc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit e11ed9d2b4558eeacff81557dc9557001af42a6b)
*	AArch64: Rename IS_ARES to IS_NEOVERSE_N1	Wilco Dijkstra	2020-10-14	3	-4/+8
\| \| \| \| \| \| \|	Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 0f6278a8793a5d04ea31878119eccf99f469a02d)
*	[AArch64] Improve integer memcpy	Wilco Dijkstra	2020-10-14	1	-96/+101
\| \| \| \| \| \| \| \|	Further optimize integer memcpy. Small cases now include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Comments have been rewritten. (cherry picked from commit 700065132744e0dfa6d4d9142d63f6e3a1934726)
*	aarch64: Increase small and medium cases for __memcpy_generic	Krzysztof Koch	2020-10-14	1	-35/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case. Benchmarking: The attached figures show relative timing difference with respect to 'memcpy_generic', which is the existing implementation. 'memcpy_med_128' denotes the the version of memcpy_generic with only the medium case enlarged. The 'memcpy_med_128_small_32' numbers are for the version of memcpy_generic submitted in this patch, which has both medium and small cases enlarged. The figures were generated using the script from: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html Depending on the platform, the performance improvement in the bench-memcpy-random.c benchmark ranges from 6% to 20% between the original and final version of memcpy.S Tested against GLIBC testsuite and randomized tests. (cherry picked from commit b9f145df85145506f8e61bac38b792584a38d88f)
*	AArch64: Improve backwards memmove performance	Wilco Dijkstra	2020-10-14	1	-3/+4
\| \| \| \| \| \| \| \| \|	On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit bd394d131c10c9ec22c6424197b79410042eed99)
*	AArch64: Add optimized Q-register memcpy	Wilco Dijkstra	2020-10-14	5	-4/+255
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit 4a733bf375238a6a595033b5785cea7f27d61307)
*	AArch64: Align ENTRY to a cacheline	Wilco Dijkstra	2020-10-14	1	-1/+1
\| \| \| \| \| \| \| \|	Given almost all uses of ENTRY are for string/memory functions, align ENTRY to a cacheline to simplify things. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit 34f0d01d5e43c7dedd002ab47f6266dfb5b79c22)
*	Fix avx2 strncmp offset compare condition check [BZ #25933]	Sunil K Pandey	2020-07-04	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size. (cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)
*	Fix array overflow in backtrace on PowerPC (bug 25423)	Andreas Schwab	2020-03-20	2	-0/+4
\| \| \| \| \| \| \| \|	When unwinding through a signal frame the backtrace function on PowerPC didn't check array bounds when storing the frame address. Fixes commit d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines"). (cherry picked from commit d93769405996dfc11d216ddbe415946617b5a494)
*	riscv: Do not use __has_include__	Florian Weimer	2020-01-21	1	-1/+1
\| \| \| \| \| \|	The user-visible preprocessor construct is called __has_include. (cherry picked from commit 28dd3939221ab26c6774097e9596e30d9753f758)
*	misc/test-errno-linux: Handle EINVAL from quotactl	Florian Weimer	2019-12-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	In commit 3dd4d40b420846dd35869ccc8f8627feef2cff32 ("xfs: Sanity check flags of Q_XQUOTARM call"), Linux 5.4 added checking for the flags argument, causing the test to fail due to too restrictive test expectations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit 1f7525d924b608a3e43b10fcfb3d46b8a6e9e4f9)
*	rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC ↵	Marcin Kościelnicki	2019-11-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	(CVE-2019-19126) [BZ #25204] The problem was introduced in glibc 2.23, in commit b9eb92ab05204df772eb4929eccd018637c9f3e9 ("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT"). (cherry picked from commit d5dfad4326fc683c813df1e37bbf5cf920591c8e)
*	mips: Force RWX stack for hard-float builds that can run on pre-4.8 kernels	Dragan Mladjenovic	2019-11-05	3	-5/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux/Mips kernels prior to 4.8 could potentially crash the user process when doing FPU emulation while running on non-executable user stack. Currently, gcc doesn't emit .note.GNU-stack for mips, but that will change in the future. To ensure that glibc can be used with such future gcc, without silently resulting in binaries that might crash in runtime, this patch forces RWX stack for all built objects if configured to run against minimum kernel version less than 4.8. * sysdeps/unix/sysv/linux/mips/Makefile (test-xfail-check-execstack): Move under mips-has-gnustack != yes. (CFLAGS-.o, ASFLAGS-.o): New rules. Apply -Wa,-execstack if mips-force-execstack == yes. * sysdeps/unix/sysv/linux/mips/configure: Regenerated. * sysdeps/unix/sysv/linux/mips/configure.ac (mips-force-execstack): New var. Set to yes for hard-float builds with minimum_kernel < 4.8.0 or minimum_kernel not set at all. (mips-has-gnustack): New var. Use value of libc_cv_as_noexecstack if mips-force-execstack != yes, otherwise set to no. (cherry picked from commit 33bc9efd91de1b14354291fc8ebd5bce96379f12)
*	Call _dl_open_check after relocation [BZ #24259]	H.J. Lu	2019-10-31	16	-3/+332
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a workaround for [BZ #20839] which doesn't remove the NODELETE object when _dl_open_check throws an exception. Move it after relocation in dl_open_worker to avoid leaving the NODELETE object mapped without relocation. [BZ #24259] * elf/dl-open.c (dl_open_worker): Call _dl_open_check after relocation. * sysdeps/x86/Makefile (tests): Add tst-cet-legacy-5a, tst-cet-legacy-5b, tst-cet-legacy-6a and tst-cet-legacy-6b. (modules-names): Add tst-cet-legacy-mod-5a, tst-cet-legacy-mod-5b, tst-cet-legacy-mod-5c, tst-cet-legacy-mod-6a, tst-cet-legacy-mod-6b and tst-cet-legacy-mod-6c. (CFLAGS-tst-cet-legacy-5a.c): New. (CFLAGS-tst-cet-legacy-5b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5a.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-5c.c): Likewise. (CFLAGS-tst-cet-legacy-6a.c): Likewise. (CFLAGS-tst-cet-legacy-6b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6a.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6b.c): Likewise. (CFLAGS-tst-cet-legacy-mod-6c.c): Likewise. ($(objpfx)tst-cet-legacy-5a): Likewise. ($(objpfx)tst-cet-legacy-5a.out): Likewise. ($(objpfx)tst-cet-legacy-mod-5a.so): Likewise. ($(objpfx)tst-cet-legacy-mod-5b.so): Likewise. ($(objpfx)tst-cet-legacy-5b): Likewise. ($(objpfx)tst-cet-legacy-5b.out): Likewise. (tst-cet-legacy-5b-ENV): Likewise. ($(objpfx)tst-cet-legacy-6a): Likewise. ($(objpfx)tst-cet-legacy-6a.out): Likewise. ($(objpfx)tst-cet-legacy-mod-6a.so): Likewise. ($(objpfx)tst-cet-legacy-mod-6b.so): Likewise. ($(objpfx)tst-cet-legacy-6b): Likewise. ($(objpfx)tst-cet-legacy-6b.out): Likewise. (tst-cet-legacy-6b-ENV): Likewise. * sysdeps/x86/tst-cet-legacy-5.c: New file. * sysdeps/x86/tst-cet-legacy-5a.c: Likewise. * sysdeps/x86/tst-cet-legacy-5b.c: Likewise. * sysdeps/x86/tst-cet-legacy-6.c: Likewise. * sysdeps/x86/tst-cet-legacy-6a.c: Likewise. * sysdeps/x86/tst-cet-legacy-6b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5a.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-5c.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6a.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6b.c: Likewise. * sysdeps/x86/tst-cet-legacy-mod-6c.c: Likewise. (cherry picked from commit d0093c5cefb7f7a4143f3bb03743633823229cc6)
*	[AArch64] Add ifunc support for Ares	Wilco Dijkstra	2019-09-06	3	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu. (cherry picked from commit 02f440c1ef5d5d79552a524065aa3e2fabe469b9)
*	posix: Fix large mmap64 offset for mips64n32 (BZ#24699)	Adhemerval Zanella	2019-07-12	3	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fix for BZ#21270 (commit 158d5fa0e19) added a mask to avoid offset larger than 1^44 to be used along __NR_mmap2. However mips64n32 users __NR_mmap, as mips64n64, but still defines off_t as old non-LFS type (other ILP32, such x32, defines off_t being equal to off64_t). This leads to use the same mask meant only for __NR_mmap2 call for __NR_mmap, thus limiting the maximum offset it can use with mmap64. This patch fixes by setting the high mask only for __NR_mmap2 usage. The posix/tst-mmap-offset.c already tests it and also fails for mips64n32. The patch also change the test to check for an arch-specific header that defines the maximum supported offset. Checked on x86_64-linux-gnu, i686-linux-gnu, and I also tests tst-mmap-offset on qemu simulated mips64 with kernel 3.2.0 kernel for both mips-linux-gnu and mips64-n32-linux-gnu. [BZ #24699] * posix/tst-mmap-offset.c: Mention BZ #24699. (do_test_bz21270): Rename to do_test_large_offset and use mmap64_maximum_offset to check for maximum expected offset value. * sysdeps/generic/mmap_info.h: New file. * sysdeps/unix/sysv/linux/mips/mmap_info.h: Likewise. * sysdeps/unix/sysv/linux/mmap64.c (MMAP_OFF_HIGH_MASK): Define iff __NR_mmap2 is used. (cherry picked from commit a008c76b56e4f958cf5a0d6f67d29fade89421b7)