about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
...
* aarch64: MTE compatible memrchrAlex Butler2020-06-231-116/+83
| | | | | | | | | | | | Add support for MTE to memrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* aarch64: MTE compatible memchrAlex Butler2020-06-231-107/+80
| | | | | | | | | | | | Add support for MTE to memchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>
* aarch64: MTE compatible strcpyAlex Butler2020-06-231-263/+122
| | | | | | | | | | | | Add support for MTE to strcpy. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* Add MREMAP_DONTUNMAP from Linux 5.7Joseph Myers2020-06-231-0/+1
| | | | | | | Add the new constant MREMAP_DONTUNMAP from Linux 5.7 to bits/mman-shared.h. Tested with build-many-glibcs.py.
* x86: Update CPU feature detection [BZ #26149]H.J. Lu2020-06-225-390/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.
* aarch64: Remove fpu MakefileAdhemerval Zanella2020-06-221-14/+0
| | | | | | | | The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.
* m68k: Use sqrt{f} builtin for coldfireAdhemerval Zanella2020-06-223-53/+4
| | | | Checked with a build for m68k-linux-gnu-coldfire.
* arm: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-92/+9
| | | | Checked on arm-linux-gnueabi and armv7-linux-gnueabihf
* riscv: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-56/+4
| | | | | | Checked with a build for riscv64-linux-gnu-rv64imac-lp64 (no builtin support), riscv64-linux-gnu-rv64imafdc-lp64, and riscv64-linux-gnu-rv64imafdc-lp64d.
* s390: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-60/+4
| | | | Checked on s390x-linux-gnu.
* sparc: Use sqrt{f} builtinAdhemerval Zanella2020-06-222-34/+4
| | | | | | It also enabled to use fsqrtd on sparc64. Checked on sparcv9-linux-gnu and sparc64-linux-gnu.
* mips: Use sqrt{f} builtinAdhemerval Zanella2020-06-229-82/+6
| | | | | Checked with a build against mips-linux-gnu and mips64-linux-gnu and comparing the resulting binaries.
* alpha: Use builtin sqrt{f}Adhemerval Zanella2020-06-225-275/+13
| | | | | | | | The generic implementation is simplified by removing the 'optimization' for !_IEEE_FP_INEXACT (which does not handle inexact neither some values). Checked on alpha-linux-gnu.
* i386: Use builtin sqrtlAdhemerval Zanella2020-06-222-21/+0
| | | | Checked on i686-linux-gnu.
* x86_64: Use builtin sqrt{f,l}Adhemerval Zanella2020-06-224-65/+31
| | | | Checked on x86_64-linux-gnu.
* powerpc: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-80/+42
| | | | | | | | | | The powerpc sqrt implementation is also simplified: - the static constants are open coded within the implementation. - for !USE_SQRT_BUILTIN the function is implemented directly on __ieee754_sqrt (it avoid an superflous extra jump). Checked on powerpc-linux-gnu and powerpc64le-linux-gnu.
* s390x: Use fma{f} builtinAdhemerval Zanella2020-06-223-64/+4
| | | | Checked on s390x-linux-gnu.
* aarch64: Use math-use-builtins for ceil{f}Adhemerval Zanella2020-06-222-58/+0
| | | | | | | The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on c9feb1be93). Checked on aarch64-linux-gnu.
* math: Decompose math-use-builtins.hAdhemerval Zanella2020-06-2228-312/+181
| | | | | | | | | | | Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.
* hurd: Add mremapSamuel Thibault2020-06-204-1/+185
| | | | | | | * sysdeps/mach/hurd/mremap.c: New file. * sysdeps/mach/hurd/Makefile [misc] (sysdep_routines): Add mremap. * sysdeps/mach/hurd/Versions (libc.GLIBC_2.32): Add mremap. * sysdeps/mach/hurd/i386/libc.abilist: Add mremap.
* ia64: Use generic exp10fAdhemerval Zanella2020-06-197-572/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation is slight worse (Itanium(R) Processor 9020): Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.61582e+08, "iterations": 2.384e+07, "reciprocal-throughput": 14.8334, "latency": 15.5006, "max-throughput": 6.74153e+07, "min-throughput": 6.45136e+07 } } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.85549e+08, "iterations": 2.384e+07, "reciprocal-throughput": 15.8391, "latency": 16.5056, "max-throughput": 6.31348e+07, "min-throughput": 6.05857e+07 } } However it fixes all the issues on both: math/test-float-exp10 math/test-float32-exp10 (all the issues wrong results for non default rounding modes). The existing ia64 libm interface uses matherrf and matherrl in addition to matherr for SVID error handling. However, there is no such error handling support for exp10f in ia64 libm. So replacing it with the generic implementation should be fine. Checked on ia64-linux-gnu.
* New exp10f version without SVID compat wrapperAdhemerval Zanella2020-06-1933-8/+64
| | | | | | | | | | | | | | | | | | | | | | | This patch changes the exp10f error handling semantics to only set errno according to POSIX rules. New symbol version is introduced at GLIBC_2.32. The old wrappers are kept for compat symbols. There are some outliers that need special handling: - ia64 provides an optimized implementation of exp10f that uses ia64 specific routines to set SVID compatibility. The new symbol version is aliased to the exp10f one. - m68k also provides an optimized implementation, and the new version uses it instead of the sysdeps/ieee754/flt32 one. - riscv and csky uses the generic template implementation that does not provide SVID support. For both cases a new exp10f version is not added, but rather the symbols version of the generic sysdeps/ieee754/flt32 is adjusted instead. Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu.
* i386: Use generic exp10fAdhemerval Zanella2020-06-191-54/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation is twice as fast. Using the exp10f benchmark: * master: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.02967e+09, "iterations": 4.768e+07, "reciprocal-throughput": 18.3579, "latency": 24.8331, "max-throughput": 5.44725e+07, "min-throughput": 4.02688e+07 } } * patched: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.01821e+09, "iterations": 6.1984e+07, "reciprocal-throughput": 13.1975, "latency": 19.6563, "max-throughput": 7.57719e+07, "min-throughput": 5.08743e+07 } } Checked on i686-linux-gnu.
* math: Optimized generic exp10f with wrappersPaul Zimmermann2020-06-193-33/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is inspired by expf and reuses its tables and internal functions. The error checks are inlined and errno setting is in separate tail called functions, but the wrappers are kept in this patch to handle the _LIB_VERSION==_SVID_ case. Double precision arithmetics is used which is expected to be faster on most targets (including soft-float) than using single precision and it is easier to get good precision result with it. Result for x86_64 (i7-4790K CPU @ 4.00GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.0414e+09, "iterations": 1.00128e+08, "reciprocal-throughput": 26.6818, "latency": 54.043, "max-throughput": 3.74787e+07, "min-throughput": 1.85038e+07 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.11951e+09, "iterations": 1.23968e+08, "reciprocal-throughput": 21.0581, "latency": 45.4028, "max-throughput": 4.74876e+07, "min-throughput": 2.20251e+07 } Result for aarch64 (A72 @ 2GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.62362e+09, "iterations": 3.3376e+07, "reciprocal-throughput": 127.698, "latency": 149.365, "max-throughput": 7.831e+06, "min-throughput": 6.69501e+06 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.29108e+09, "iterations": 6.6752e+07, "reciprocal-throughput": 51.2111, "latency": 77.3568, "max-throughput": 1.9527e+07, "min-throughput": 1.29271e+07 } Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu, and sparc64-linux-gnu.
* benchtests: Add exp10f benchmarkAdhemerval Zanella2020-06-192-1/+2390
| | | | | | It is based on expf one by converting each line with the formula: new_val = (float) log10 (exp ((double) old_val))
* x86: Update F16C detection [BZ #26133]H.J. Lu2020-06-182-3/+7
| | | | Since F16C requires AVX, set F16C usable only when AVX is usable.
* Fix avx2 strncmp offset compare condition check [BZ #25933]Sunil K Pandey2020-06-171-0/+15
| | | | | | | | | strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.
* nptl: Remove now-spurious tst-cancelx9 referencesSamuel Thibault2020-06-171-2/+1
| | | | | | | | They were to be moved to sysdeps/pthread/Makefile in 45fce058f ('htl: Enable more cancellation tests') * nptl/Makefile: (tests): Remove tst-cancelx9. (CFLAGS-tst-cancelx9.c): Remove.
* x86_64: Use %xmmN with vpxor to clear a vector registerH.J. Lu2020-06-172-3/+3
| | | | | Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.
* x86: Correct bit_cpu_CLFLUSHOPT [BZ #26128]H.J. Lu2020-06-171-1/+1
| | | | bit_cpu_CLFLUSHOPT should be (1u << 23), not (1u << 22).
* powerpc64le: refactor e_sqrtf128.cPaul E. Murphy2020-06-162-39/+7
| | | | | Combine both implementations into a single file to allow building twice with appropriate multiarch support when possible.
* Update syscall-names.list for Linux 5.7.Joseph Myers2020-06-151-2/+2
| | | | | | | Linux 5.7 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.7. Tested with build-many-glibcs.py.
* ieee754/dbl-64: Reduce the scope of temporary storage variablesVineet Gupta2020-06-157-223/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This came to light when adding hard-flaot support to ARC glibc port without hardware sqrt support causing glibc build to fail: | ../sysdeps/ieee754/dbl-64/e_sqrt.c: In function '__ieee754_sqrt': | ../sysdeps/ieee754/dbl-64/e_sqrt.c:58:54: error: unused variable 'ty' [-Werror=unused-variable] | double y, t, del, res, res1, hy, z, zz, p, hx, tx, ty, s; The reason being EMULV() macro uses the hardware provided __builtin_fma() variant, leaving temporary variables 'p, hx, tx, hy, ty' unused hence compiler warning and ensuing error. The intent of the patch was to fix that error, but EMULV is pervasive and used fair bit indirectly via othe rmacros, hence this patch. Functionally it should not result in code gen changes and if at all those would be better since the scope of those temporaries is greatly reduced now Built tested with aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf hppa-linux-gnu x86_64-linux-gnu arm-linux-gnueabihf riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64 powerpc-linux-gnu microblaze-linux-gnu nios2-linux-gnu hppa-linux-gnu Also as suggested by Joseph [1] used --strip and compared the libs with and w/o patch and they are byte-for-byte unchanged (with gcc 9). | for i in `find . -name libm-2.31.9000.so`; | do | echo $i; diff $i /SCRATCH/vgupta/gnu2/install/glibcs/$i ; echo $?; | done | ./aarch64-linux-gnu/lib64/libm-2.31.9000.so | 0 | ./arm-linux-gnueabi/lib/libm-2.31.9000.so | 0 | ./x86_64-linux-gnu/lib64/libm-2.31.9000.so | 0 | ./arm-linux-gnueabihf/lib/libm-2.31.9000.so | 0 | ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so | 0 | ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so | 0 | ./powerpc-linux-gnu/lib/libm-2.31.9000.so | 0 | ./microblaze-linux-gnu/lib/libm-2.31.9000.so | 0 | ./nios2-linux-gnu/lib/libm-2.31.9000.so | 0 | ./hppa-linux-gnu/lib/libm-2.31.9000.so | 0 | ./s390x-linux-gnu/lib64/libm-2.31.9000.so [1] https://sourceware.org/pipermail/libc-alpha/2019-November/108267.html
* manual: Add pthread_attr_setsigmask_np, pthread_attr_getsigmask_npFlorian Weimer2020-06-151-0/+72
| | | | And the PTHREAD_ATTR_NO_SIGMASK_NP constant.
* ld.so: Check for new cache format first and enhance corruption checkFlorian Weimer2020-06-151-12/+15
| | | | | | | | | Now that ldconfig defaults to the new format (only), check for it first. Also apply the corruption check added in commit 2954daf00bb4d ("Add more checks for valid ld.so.cache file (bug 18093)") to the new-format-only case. Suggested-by: Josh Triplett <josh@joshtriplett.org>
* hurd: Fix __writev_nocancel_nostatusSamuel Thibault2020-06-145-2/+46
| | | | | | | | | | | | | * sysdeps/mach/hurd/Makefile [subdir=misc] (sysdep_routines): Add writev_nocancel writev_nocancel_nostatus. * sysdeps/mach/hurd/not-cancel.h (__writev_nocancel_nostatus): Replace macro with function declaration (with hidden prototype in libc). (__writev_nocancel): New function declaration (with hidden prototype in libc). * sysdeps/mach/hurd/writev_nocancel_nostatus.c: New file. * sysdeps/posix/writev_nocancel.c: New file, includes writev.c to make a nocancel variant that calls __write_nocancel. * sysdeps/posix/writev.c (writev): Do not define alias if __writev is renamed.
* hurd: Make send* cancellation pointsSamuel Thibault2020-06-143-0/+10
| | | | | | | * sysdeps/mach/hurd/send.c (__send): Make the __socket_send call a cancellation point. * sysdeps/mach/hurd/sendto.c (__sendto): Likewise. * sysdeps/mach/hurd/sendmsg.c (__libc_sendmsg): Likewise.
* htl: Enable more cancellation testsSamuel Thibault2020-06-146-5/+6
| | | | | | | | * nptl/tst-cancel-self-cancelstate.c, tst-cancel-self.c, tst-cancel9.c, tst-cancelx9.c: Move to... * sysdeps/pthread: ... here. * nptl/Makefile: Move corresponding references and rules to... * sysdeps/pthread/Makefile: ... here.
* hurd: Make write and pwrite64 cancellation pointsSamuel Thibault2020-06-149-19/+94
| | | | | | | | | | | | | | | | | | | | | | | | | and add _nocancel variants. * sysdeps/mach/hurd/write.c (__libc_write): Call __write_nocancel surrounded by enabling async cancel, to replace implementation moved to... * sysdeps/mach/hurd/write_nocancel.c (__write_nocancel): ... here. * sysdeps/mach/hurd/pwrite64.c (__libc_pwrite64): Call __pwrite64_nocancel surrounded by enabling async cancel, to replace implementation moved to... * sysdeps/mach/hurd/pwrite64_nocancel.c (__pwrite64_nocancel): ... here. * sysdeps/mach/hurd/Makefile (sysdep_routines): Add write_nocancel and pwrite64_nocancel. * sysdeps/mach/hurd/not-cancel.h (__write_nocancel, __pwrite64_nocancel): Replace macros with prototypes with a hidden proto on libc. * sysdeps/mach/hurd/dl-sysdep.c (__write_nocancel): New alias, check that it is not hidden. * sysdeps/mach/hurd/Versions (libc.GLIBC_PRIVATE): Add __write_nocancel. (ld.GLIBC_PRIVATE): Add __write_nocancel. * sysdeps/mach/hurd/i386/localplt.data (__write_nocancel): Add reference.
* htl: Fix cleanup support for IO lockingSamuel Thibault2020-06-142-0/+90
| | | | | | | | | | * sysdeps/htl/stdio-lock.h: New file, registers locking cleanup to htl. * sysdeps/htl/libc-lockP.h: Include <libc-lock.h>. (__libc_cleanup_region_start, __libc_cleanup_end, __libc_cleanup_region_end): Override macros from <libc-lock.h> with versions which register cleanup to htl. (__pthread_get_cleanup_stack): Make reference weak for skipping registration on in the static non-libpthread case.
* htl: Move cleanup stack to variable shared between libc and pthreadSamuel Thibault2020-06-146-6/+9
| | | | | | | | | | | | | | | | | | | | If libpthread gets loaded dynamically, the stack needs to already contain the cleanup handlers of the main thread. * htl/libc_pthread_init.c (__pthread_cleanup_stack): New per-thread variable. * htl/Versions (libc): Add __pthread_cleanup_stack as private symbol. * htl/pt-internal.h (struct __pthread): Remove cancelation_handlers field. (__pthread_cleanup_stack): Add variable declaration. * htl/pt-alloc.c (initialize_pthread): Remove initialization of cancelation_handlers field. * htl/pt-cleanup.c (__pthread_get_cleanup_stack): Return the address of __pthread_cleanup_stack instead of that of the cancelation_handlers field. * htl/forward.c: Include <pt-internal.h>. (dummy_list): Remove variable. (__pthread_get_cleanup_stack): Return the address of __pthread_cleanup_stack instead of that of dummy_list.
* htl: initialize first and prevent from unloadingSamuel Thibault2020-06-141-0/+1
| | | | | | | libc does not have codepaths for reverting the load of a libpthread. * htl/Makefile (LDFLAGS-pthread.so): Pass -z nodelete -z initfirst to linker.
* htl: Add noreturn attribute on __pthread_exit forwardSamuel Thibault2020-06-141-2/+2
| | | | | | | * sysdeps/htl/pthread-functions.h (__pthread_exit): Add noreturn attribute. (struct pthread_functions): Add noreturn attribute on ptr___pthread_exit field.
* hurd: Make recv* cancellation pointsSamuel Thibault2020-06-143-12/+38
| | | | | | | | | | * sysdeps/mach/hurd/recv.c (__recv): Make the __socket_recv call cancellable. * sysdeps/mach/hurd/recvfrom.c (__recvfrom): Make the __socket_recv and __socket_whatis_address calls cancellable. * sysdeps/mach/hurd/recvmsg.c (__libc_recvmsg): Make the __socket_recv, __socket_whatis_address, __io_reauthenticate, and __auth_user_authenticate calls cancellable.
* powerpc: Automatic CPU detection in preconfigurePaul E. Murphy2020-06-112-7/+113
| | | | | | | | | Added a check to detect the CPU value in preconfigure, so that glibc is built with the correct --with-cpu value. And move existing checks into preconfigure.ac. Co-Authored-By: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Co-Authored-By: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
* Use Linux 5.7 in build-many-glibcs.py.Joseph Myers2020-06-101-1/+1
| | | | | | | This patch makes build-many-glibcs.py use Linux 5.7. Tested with build-many-glibcs.py (host-libraries, compilers and glibcs builds).
* htl: Enable more cancel testsSamuel Thibault2020-06-109-16/+18
| | | | | | | * nptl/tst-cancel11.c, tst-cancel21-static.c, tst-cancel21.c, tst-cancel6.c, tst-cancelx11.c, tst-cancelx21.c, tst-cancelx6.c: Move to... * sysdeps/pthread: ... here. * nptl/Makefile: Move corresponding references and rules to... * sysdeps/pthread/Makefile: ... here.
* htl: Fix linking static tests by factorizing the symbols listSamuel Thibault2020-06-104-45/+30
| | | | | | | | | | | libpthread_syms.a will contain the symbols that libc tries to get from libpthread, to be used by the system, but also by tests. * htl/libpthread.a, htl/libpthread_pic.a: Link libpthread_syms.a and Move EXTERN references to... * htl/libpthread_syms.a: ... new file. Add missing __pthread_enable_asynccancel reference. * htl/Makefile: Install libpthread_syms.a and link it into static tests.
* Add "%d" support to _dl_debug_vdprintfH.J. Lu2020-06-091-2/+29
| | | | "%d" will be used to print out signed value.
* aarch64: MTE compatible strlenAndrea Corallo2020-06-091-183/+56
| | | | | | | | | | | | | | | | Introduce an Arm MTE compatible strlen implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance on modern cores. On cores with less efficient Advanced SIMD implementation such as Cortex-A53 it can be slower. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>