about summary refs log tree commit diff
path: root/sysdeps/x86_64/multiarch
Commit message (Collapse)AuthorAgeFilesLines
* x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591]Noah Goldstein2022-11-241-5/+2
| | | | | | | | | | | | | Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch. (cherry picked from commit b0969fa53a28b4ab2159806bf6c99a98999502ee)
* x86-64: Require BMI2 for avx2 functions [BZ #29611]Sunil K Pandey2022-09-281-10/+28
| | | | This patch fixes BZ #29611
* x86-64: Require BMI2 for strchr-avx2.S [BZ #29611]H.J. Lu2022-09-282-5/+11
| | | | | | | | | | | | | | | | | | | | | Since strchr-avx2.S updated by commit 1f745ecc2109890886b161d4791e1406fdfc29b8 Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h. This fixes BZ #29611. (cherry picked from commit 83c5b368226c34a2f0a5287df40fc290b2b34359)
* x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #28896]Noah Goldstein2022-02-185-3/+5
| | | | | | | | | | | | | | In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit c6272098323153db373f2986c67786ea8c85f1cf)
* x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064]Noah Goldstein2022-02-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | The following commit commit 6f573a27b6c8b4236445810a44660612323f5a73 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc implementation list and adding wcslen-sse4.1 to the ifunc implementation list. Testing: test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as well as all other tests in wcsmbs and string. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 0679442defedf7e52a94264975880ab8674736b2)
* x86: Optimize strlen-evex.SNoah Goldstein2022-01-271-264/+317
| | | | | | | | | | | No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit 4ba65586847751372520a36757c17f114588794e)
* x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974]Noah Goldstein2022-01-272-38/+107
| | | | | | | | | | | | | | This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on maxlen * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit a775a7a3eb1e85b54af0b4ee5ff4dcf66772a1fb)
* x86-64: Add wcslen optimize for sse4.1Noah Goldstein2022-01-276-36/+63
| | | | | | | | | | No bug. This comment adds the ifunc / build infrastructure necessary for wcslen to prefer the sse4.1 implementation in strlen-vec.S. test-wcslen.c is passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 6f573a27b6c8b4236445810a44660612323f5a73)
* x86-64: Move strlen.S to multiarch/strlen-vec.SH.J. Lu2022-01-273-2/+259
| | | | | | | | | | Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1 version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants. This also removes the unused symbols, __GI___strlen_sse2 and __GI___wcsnlen_sse4_1. (cherry picked from commit a0db678071c60b6c47c468d231dd0b3694ba7a98)
* x86-64: Fix an unknown vector operation in memchr-evex.SAlice Xu2022-01-271-1/+1
| | | | | | | | An unknown vector operation occurred in commit 2a76821c308. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 6ea916adfa0ab9af6e7dc6adcf6f977dfe017835)
* x86: Optimize memchr-evex.SNoah Goldstein2022-01-271-225/+322
| | | | | | | | | | | | | No bug. This commit optimizes memchr-evex.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, saving some ALU in the alignment process, and most importantly increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 2a76821c3081d2c0231ecd2618f52662cb48fccd)
* x86: Optimize strlen-avx2.SNoah Goldstein2022-01-272-214/+334
| | | | | | | | | | | No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit aaa23c35071537e2dcf5807e956802ed215210aa)
* x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974]Noah Goldstein2022-01-271-18/+40
| | | | | | | | | | | | | | This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on n * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 645a158978f9520e74074e8c14047503be4db0f0)
* x86: Optimize memchr-avx2.SNoah Goldstein2022-01-271-178/+247
| | | | | | | | | | | | No bug. This commit optimizes memchr-avx2.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, asaving a few instructions the in loop return loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit acfd088a1963ba51cd83c78f95c0ab25ead79e04)
* x86-64: Require BMI2 for __strlen_evex and __strnlen_evexH.J. Lu2022-01-271-2/+4
| | | | | | | | | | | | | | | | | | | Since __strlen_evex and __strnlen_evex added by commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation. (cherry picked from commit 55bf411b451c13f0fb7ff3d3bf9a820020b45df1)
* x86-64: Fix ifdef indentation in strlen-evex.SSunil K Pandey2022-01-271-8/+8
| | | | | | | Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read. (cherry picked from commit 595c22ecd8e87a27fd19270ed30fdbae9ad25426)
* x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functionsH.J. Lu2022-01-273-19/+35
| | | | | | | | Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit e4fda4631017e49d4ee5a2755db34289b6860fa4)
* x86-64: Use ZMM16-ZMM31 in AVX512 memset family functionsH.J. Lu2022-01-274-24/+31
| | | | | | | | | Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 4e2d8f352774b56078c34648b14a2412c38384f4)
* x86-64: Add AVX optimized string/memory functions for RTMH.J. Lu2022-01-2747-190/+572
| | | | | | | | | | | | | | | | | | | Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region. (cherry picked from commit 7ebba91361badf7531d4e75050627a88d424872f)
* x86-64: Add memcmp family functions with 256-bit EVEXH.J. Lu2022-01-275-4/+467
| | | | | | | | | Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit. (cherry picked from commit 91264fe3577fe887b4860923fa6142b5274c8965)
* x86-64: Add memset family functions with 256-bit EVEXH.J. Lu2022-01-276-14/+90
| | | | | | | | | Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 1b968b6b9b3aac702ac2f133e0dd16cfdbb415ee)
* x86-64: Add memmove family functions with 256-bit EVEXH.J. Lu2022-01-275-11/+97
| | | | | | | | Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit 63ad43566f7a25d140dc723598aeb441ad657eed)
* x86-64: Add strcpy family functions with 256-bit EVEXH.J. Lu2022-01-279-0/+1338
| | | | | | | | Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit 525bc2a32c9710df40371f951217c6ae7a923aee)
* x86-64: Add ifunc-avx2.h functions with 256-bit EVEXH.J. Lu2022-01-2724-17/+2996
| | | | | | | | | | | | Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set. (cherry picked from commit 1fd8c163a83d96ace1ff78fa6bac7aee084f6f77)
* x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]Noah Goldstein2022-01-271-0/+10
| | | | | | | | | | | Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to __wcscmp_avx2. For x86_64 this covers the entire address range so any length larger could not possibly be used to bound `s1` or `s2`. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit ddf0992cf57a93200e0c782e2a94d0733a5a0b87)
* x86-64: Avoid rep movsb with short distance [BZ #27130]H.J. Lu2021-01-121-0/+21
| | | | | | | | | | | | | | | | | | | When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB. (cherry picked from commit 3ec5d83d2a237d39e7fd6ef7a0bc8ac4c171a4a5)
* Fix avx2 strncmp offset compare condition check [BZ #25933]Sunil K Pandey2020-07-041-0/+15
| | | | | | | | | | | strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size. (cherry picked from commit 75870237ff3bb363447b03f4b0af100227570910)
* x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-011-3/+6
| | | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length. Clear the upper 32 bits of RSI register. * sysdeps/x86_64/strlen.S: Use RSI_LP for length. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen and tst-size_t-wcsnlen. * sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file. * sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise. (cherry picked from commit 5165de69c0908e28a380cbd4bb054e55ea4abc95)
* x86-64 strncpy: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-012-5/+5
| | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy. * sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file. (cherry picked from commit c7c54f65b080affb87a1513dee449c8ad6143c8b)
* x86-64 strncmp family: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-012-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes the strncmp family for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcmp-avx2.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcmp-sse42.S: Likewise. * sysdeps/x86_64/strcmp.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp, tst-size_t-strncmp and tst-size_t-wcsncmp. * sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise. * sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise. (cherry picked from commit ee915088a0231cd421054dbd8abab7aadf331153)
* x86-64 memset/wmemset: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-012-14/+26
| | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memset/wmemset for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset. * sysdeps/x86_64/x32/tst-size_t-memset.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise. (cherry picked from commit 82d0b4a4d76db554eb6757acb790fcea30b19965)
* x86-64 memrchr: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-011-2/+2
| | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memrchr for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/memrchr.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr. * sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file. (cherry picked from commit ecd8b842cf37ea112e59cd9085ff1f1b6e208ae0)
* x86-64 memcpy: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-014-41/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memcpy for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcpy. tst-size_t-wmemchr. * sysdeps/x86_64/x32/tst-size_t-memcpy.c: New file. (cherry picked from commit 231c56760c1e2ded21ad96bbb860b1f08c556c7a)
* x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-013-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memcmp/wmemcmp for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise. * sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and tst-size_t-wmemcmp. * sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise. (cherry picked from commit b304fc201d2f6baf52ea790df8643e99772243cd)
* x86-64 memchr/wmemchr: Properly handle the length parameter [BZ #24097]H.J. Lu2019-02-011-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memchr/wmemchr for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ #24097] CVE-2019-6488 * sysdeps/x86_64/memchr.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and tst-size_t-wmemchr. * sysdeps/x86_64/x32/test-size_t.h: New file. * sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise. * sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise. (cherry picked from commit 97700a34f36721b11a754cf37a1cc40695ece1fd)
* x86-64: Use _CET_NOTRACK in memcmp-sse4.SH.J. Lu2018-07-181-1/+1
| | | | | * sysdeps/x86_64/multiarch/memcmp-sse4.S (BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump to jump table.
* x86-64: Use _CET_NOTRACK in memcpy-ssse3.SH.J. Lu2018-07-181-62/+62
| | | | | | | * sysdeps/x86_64/multiarch/memcpy-ssse3.S (BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump to jump table. (MEMCPY): Likewise.
* x86-64: Use _CET_NOTRACK in memcpy-ssse3-back.SH.J. Lu2018-07-181-3/+3
| | | | | | | * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump to jump table. (MEMCPY): Likewise.
* x86-64: Use _CET_NOTRACK in strcmp-sse42.SH.J. Lu2018-07-181-1/+1
| | | | | * sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add _CET_NOTRACK before indirect jump to jump table.
* x86-64: Use _CET_NOTRACK in strcpy-sse2-unaligned.SH.J. Lu2018-07-181-1/+1
| | | | | | * sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S (BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump to jump table.
* x86-64: Add _CET_ENDBR to STRCMP_SSE42H.J. Lu2018-07-171-0/+1
| | | | | | | | | | Add _CET_ENDBR to STRCMP_SSE42, which is called indirectly, to support IBT. Reviewed-by: Carlos O'Donell <carlos@redhat.com> * sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add _CET_ENDBR.
* x86: Make strncmp usable from rtldFlorian Weimer2018-06-121-4/+7
| | | | | | | | | Due to the way the conditions were written, the rtld build of strncmp ended up with no definition of the strncmp symbol at all: The implementations were renamed for use within an IFUNC resolver, but the IFUNC resolver itself was missing (because rtld does not use IFUNCs). Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2Leonardo Sandoval2018-06-0112-2/+1006
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector comparison as much as possible. Peak performance observed on a SkyLake machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, respectively. The larger the comparison length, the more benefit using avx2 functions, except on the strcmp, where peak is observed at length == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and wcsncmp-sse2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strcmp_avx2, __strncmp_avx2, __wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2 and __wcsncmp_sse2. * sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)): (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred. * sysdeps/x86_64/multiarch/strncmp.c: Likewise. * sysdeps/x86_64/multiarch/strcmp-avx2.S: New file. * sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcscmp.c: Likewise. * sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise. * sysdeps/x86_64/multiarch/wcsncmp.c: Likewise. * sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp is undefined.
* x86-64: Skip zero length in __mem[pcpy|move|set]_ermsH.J. Lu2018-05-232-0/+11
| | | | | | | | | | | | | This patch skips zero length in __mempcpy_erms, __memmove_erms and __memset_erms. Tested on x86-64. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_erms): Skip zero length. (__memmove_erms): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_erms): Likewise.
* Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)Andreas Schwab2018-05-231-2/+3
| | | | | When compiled as mempcpy, the return value is the end of the destination buffer, thus it cannot be used to refer to the start of it.
* x86-64: Check Prefer_FSRM in ifunc-memmove.hH.J. Lu2018-05-211-1/+2
| | | | | | | | | | | | | Although the REP MOVSB implementations of memmove, memcpy and mempcpy aren't used by the current processors, this patch adds Prefer_FSRM check in ifunc-memmove.h so that they can be used in the future. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New. (index_arch_Prefer_FSRM): Likewise. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Also check Prefer_FSRM. * sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR): Also return OPTIMIZE (erms) for Prefer_FSRM.
* x86-64: remove duplicate line on PREFETCH_ONE_SET macroLeonardo Sandoval2018-05-171-1/+0
| | | | | | | Tested on 64-bit AVX machine * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (PREFETCH_ONE_SET): Remove duplicate line
* x86-64: Use IFUNC strncat inside libc.soH.J. Lu2018-05-162-2/+6
| | | | | | | | | | | | | Unlike i386, we can call hidden IFUNC functions inside libc.so since x86-64 PLT is always PIC. Tested on x86-64. * sysdeps/x86_64/multiarch/strncat-c.c (STRNCAT_PRIMARY): Removed. Include <string/strncat.c>. * sysdeps/x86_64/multiarch/strncat.c (__strncat): New strong alias. (__GI___strncat): New hidden alias.
* x86-64: Remove the unnecessary testl in strlen-avx2.SH.J. Lu2018-05-141-1/+0
| | | | | | | | | Since the result of testl is never used, this patch removes it. Tested on 64-bit AVX2 machine. * sysdeps/x86_64/multiarch/strlen-avx2.S (STRLEN): Remove the unnecessary testl.
* x86-64/memset: Mark the debugger symbol as hiddenH.J. Lu2018-05-071-1/+2
| | | | | | | | | When MEMSET_SYMBOL (__memset, erms) is provided for debugger, mark it as hidden so that it will be local to the library. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_SYMBOL (__memset, erms)): Mark the debugger symbol as hidden.