about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* htl: Fix includes for lockfileSamuel Thibault2020-06-283-6/+3
| | | | | | | | | | | These only need exactly to use __libc_ptf_call. * sysdeps/htl/flockfile.c: Include <libc-lockP.h> instead of <libc-lock.h> * sysdeps/htl/ftrylockfile.c: Include <libc-lockP.h> instead of <errno.h>, <pthread.h>, <stdio-lock.h> * sysdeps/htl/funlockfile.c: Include <libc-lockP.h> instead of <pthread.h> and <stdio-lock.h>
* htl: avoid cancelling threads inside critical sectionsSamuel Thibault2020-06-271-0/+9
| | | | | | | | Like hurd_thread_cancel does. * sysdeps/mach/hurd/htl/pt-docancel.c: Include <hurd/signal.h> (__pthread_do_cancel): Lock target thread's critical_section_lock and ss lock around thread mangling.
* tst-cancel4-common.c: fix calling socketpairSamuel Thibault2020-06-261-1/+1
| | | | | | | | | | | PF_UNIX was actually never intended to be passed as protocol parameter to socket() calls: it is a protocol family, not a protocol. It happens that Linux introduced accepting it during its 2.0 development, but it shouldn't. OpenBSD kernels accept it as well, but FreeBSD and NetBSD rightfully do not. GNU/Hurd does not either. * nptl/tst-cancel4-common.c (do_test): Pass 0 instead of PF_UNIX as protocol.
* x86: Detect Intel Advanced Matrix ExtensionsH.J. Lu2020-06-263-0/+44
| | | | | | | | | | | | | | Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Intel AMX is an extensible architecture. New accelerators can be added and the existing accelerator may be enhanced to provide higher performance. The initial features are AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating system supports both XTILECFG state and XTILEDATA state. Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
* Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]Mike FABIAN2020-06-2610-9/+18
| | | | Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* S390: Optimize __memset_z196.Stefan Liebler2020-06-261-10/+9
| | | | | | | | It turned out that an 256b-mvc instruction which depends on the result of a previous 256b-mvc instruction is counterproductive. Therefore this patch adjusts the 256b-loop by storing the first byte with stc and setting the remaining 255b with mvc. Now the 255b-mvc instruction depends on the stc instruction.
* S390: Optimize __memcpy_z196.Stefan Liebler2020-06-261-6/+15
| | | | | | This patch introduces an extra loop without pfd instructions as it turned out that the pfd instructions are usefull for copies >=64KB but are counterproductive for smaller copies.
* elf: Include <stddef.h> (for size_t), <sys/stat.h> in <ldconfig.h>Florian Weimer2020-06-251-0/+2
| | | | Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl: Don't madvise user provided stackSzabolcs Nagy2020-06-251-2/+3
| | | | | | | | | | | | | | | User provided stack should not be released nor madvised at thread exit because it's owned by the user. If the memory is shared or file based then MADV_DONTNEED can have unwanted effects. With memory tagging on aarch64 linux the tags are dropped and thus it may invalidate pointers. Tested on aarch64-linux-gnu with MTE, it fixes FAIL: nptl/tst-stack3 FAIL: nptl/tst-stack3-mem
* S390: Regenerate ULPs.Stefan Liebler2020-06-241-0/+2
| | | | Updates needed after recent exp10f commits.
* htl: Add wrapper header for <semaphore.h> with hidden __sem_postFlorian Weimer2020-06-243-2/+11
| | | | | | | This is required to avoid a check-localplt failure due to a sem_post call through the PLT. Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
* elf: Include <stdbool.h> in <dl-tunables.h> because bool is usedFlorian Weimer2020-06-241-0/+2
|
* htl: Fix case when sem_*wait is canceled while holding a tokenSamuel Thibault2020-06-241-2/+13
| | | | | | | | * sysdeps/htl/sem-timedwait.c (struct cancel_ctx): Add cancel_wake field. (cancel_hook): When unblocking thread, set cancel_wake field to 1. (__sem_timedwait_internal): Set cancel_wake field to 0 by default. On cancellation exit, check whether we hold a token, to be put back.
* htl: Make sem_*wait cancellations pointsSamuel Thibault2020-06-245-15/+90
| | | | | | | | | | | | | | By aligning its implementation on pthread_cond_wait. * sysdeps/htl/sem-timedwait.c (cancel_ctx): New structure. (cancel_hook): New function. (__sem_timedwait_internal): Check for cancellation and register cancellation hook that wakes the thread up, and check again for cancellation on exit. * nptl/tst-cancel13.c, nptl/tst-cancelx13.c: Move to... * sysdeps/pthread/: ... here. * nptl/Makefile: Move corresponding references and rules to... * sysdeps/pthread/Makefile: ... here.
* htl: Simplify non-cancel path of __pthread_cond_timedwait_internalSamuel Thibault2020-06-241-20/+21
| | | | | | | | | Since __pthread_exit does not return, we do not need to indent the noncancel path * sysdeps/htl/pt-cond-timedwait.c (__pthread_cond_timedwait_internal): Move cancelled path before non-cancelled path, to avoid "else" indentation.
* htl: Enable tst-cancel25 testSamuel Thibault2020-06-243-2/+4
| | | | | | | | * nptl/tst-cancel25.c: Move to... * sysdeps/pthread/tst-cancel25.c: ... here. (tf2) Do not test for SIGCANCEL when it is not defined. * nptl/Makefile: Move corresponding reference to... * sysdeps/pthread/Makefile: ... here.
* powerpc: Add new hwcap valuesTulio Magno Quites Machado Filho2020-06-232-1/+3
| | | | | | | | | Linux commit ID ee988c11acf6f9464b7b44e9a091bf6afb3b3a49 reserved 2 new bits in AT_HWCAP2: - PPC_FEATURE2_ARCH_3_1 indicates the availability of the POWER ISA 3.1; - PPC_FEATURE2_MMA indicates the availability of the Matrix-Multiply Assist facility.
* aarch64: MTE compatible strncmpAlex Butler2020-06-231-99/+145
| | | | | | | | | | | | | Add support for MTE to strncmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* aarch64: MTE compatible strcmpAlex Butler2020-06-231-109/+125
| | | | | | | | | | | | | Add support for MTE to strcmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* aarch64: MTE compatible strrchrAlex Butler2020-06-231-114/+91
| | | | | | | | | | | | Add support for MTE to strrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* aarch64: MTE compatible memrchrAlex Butler2020-06-231-116/+83
| | | | | | | | | | | | Add support for MTE to memrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* aarch64: MTE compatible memchrAlex Butler2020-06-231-107/+80
| | | | | | | | | | | | Add support for MTE to memchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>
* aarch64: MTE compatible strcpyAlex Butler2020-06-231-263/+122
| | | | | | | | | | | | Add support for MTE to strcpy. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
* Add MREMAP_DONTUNMAP from Linux 5.7Joseph Myers2020-06-231-0/+1
| | | | | | | Add the new constant MREMAP_DONTUNMAP from Linux 5.7 to bits/mman-shared.h. Tested with build-many-glibcs.py.
* x86: Update CPU feature detection [BZ #26149]H.J. Lu2020-06-225-390/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int *usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.
* aarch64: Remove fpu MakefileAdhemerval Zanella2020-06-221-14/+0
| | | | | | | | The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.
* m68k: Use sqrt{f} builtin for coldfireAdhemerval Zanella2020-06-223-53/+4
| | | | Checked with a build for m68k-linux-gnu-coldfire.
* arm: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-92/+9
| | | | Checked on arm-linux-gnueabi and armv7-linux-gnueabihf
* riscv: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-56/+4
| | | | | | Checked with a build for riscv64-linux-gnu-rv64imac-lp64 (no builtin support), riscv64-linux-gnu-rv64imafdc-lp64, and riscv64-linux-gnu-rv64imafdc-lp64d.
* s390: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-60/+4
| | | | Checked on s390x-linux-gnu.
* sparc: Use sqrt{f} builtinAdhemerval Zanella2020-06-222-34/+4
| | | | | | It also enabled to use fsqrtd on sparc64. Checked on sparcv9-linux-gnu and sparc64-linux-gnu.
* mips: Use sqrt{f} builtinAdhemerval Zanella2020-06-229-82/+6
| | | | | Checked with a build against mips-linux-gnu and mips64-linux-gnu and comparing the resulting binaries.
* alpha: Use builtin sqrt{f}Adhemerval Zanella2020-06-225-275/+13
| | | | | | | | The generic implementation is simplified by removing the 'optimization' for !_IEEE_FP_INEXACT (which does not handle inexact neither some values). Checked on alpha-linux-gnu.
* i386: Use builtin sqrtlAdhemerval Zanella2020-06-222-21/+0
| | | | Checked on i686-linux-gnu.
* x86_64: Use builtin sqrt{f,l}Adhemerval Zanella2020-06-224-65/+31
| | | | Checked on x86_64-linux-gnu.
* powerpc: Use sqrt{f} builtinAdhemerval Zanella2020-06-223-80/+42
| | | | | | | | | | The powerpc sqrt implementation is also simplified: - the static constants are open coded within the implementation. - for !USE_SQRT_BUILTIN the function is implemented directly on __ieee754_sqrt (it avoid an superflous extra jump). Checked on powerpc-linux-gnu and powerpc64le-linux-gnu.
* s390x: Use fma{f} builtinAdhemerval Zanella2020-06-223-64/+4
| | | | Checked on s390x-linux-gnu.
* aarch64: Use math-use-builtins for ceil{f}Adhemerval Zanella2020-06-222-58/+0
| | | | | | | The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on c9feb1be93). Checked on aarch64-linux-gnu.
* math: Decompose math-use-builtins.hAdhemerval Zanella2020-06-2228-312/+181
| | | | | | | | | | | Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.
* hurd: Add mremapSamuel Thibault2020-06-204-1/+185
| | | | | | | * sysdeps/mach/hurd/mremap.c: New file. * sysdeps/mach/hurd/Makefile [misc] (sysdep_routines): Add mremap. * sysdeps/mach/hurd/Versions (libc.GLIBC_2.32): Add mremap. * sysdeps/mach/hurd/i386/libc.abilist: Add mremap.
* ia64: Use generic exp10fAdhemerval Zanella2020-06-197-572/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation is slight worse (Itanium(R) Processor 9020): Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.61582e+08, "iterations": 2.384e+07, "reciprocal-throughput": 14.8334, "latency": 15.5006, "max-throughput": 6.74153e+07, "min-throughput": 6.45136e+07 } } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.85549e+08, "iterations": 2.384e+07, "reciprocal-throughput": 15.8391, "latency": 16.5056, "max-throughput": 6.31348e+07, "min-throughput": 6.05857e+07 } } However it fixes all the issues on both: math/test-float-exp10 math/test-float32-exp10 (all the issues wrong results for non default rounding modes). The existing ia64 libm interface uses matherrf and matherrl in addition to matherr for SVID error handling. However, there is no such error handling support for exp10f in ia64 libm. So replacing it with the generic implementation should be fine. Checked on ia64-linux-gnu.
* New exp10f version without SVID compat wrapperAdhemerval Zanella2020-06-1933-8/+64
| | | | | | | | | | | | | | | | | | | | | | | This patch changes the exp10f error handling semantics to only set errno according to POSIX rules. New symbol version is introduced at GLIBC_2.32. The old wrappers are kept for compat symbols. There are some outliers that need special handling: - ia64 provides an optimized implementation of exp10f that uses ia64 specific routines to set SVID compatibility. The new symbol version is aliased to the exp10f one. - m68k also provides an optimized implementation, and the new version uses it instead of the sysdeps/ieee754/flt32 one. - riscv and csky uses the generic template implementation that does not provide SVID support. For both cases a new exp10f version is not added, but rather the symbols version of the generic sysdeps/ieee754/flt32 is adjusted instead. Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu.
* i386: Use generic exp10fAdhemerval Zanella2020-06-191-54/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation is twice as fast. Using the exp10f benchmark: * master: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.02967e+09, "iterations": 4.768e+07, "reciprocal-throughput": 18.3579, "latency": 24.8331, "max-throughput": 5.44725e+07, "min-throughput": 4.02688e+07 } } * patched: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.01821e+09, "iterations": 6.1984e+07, "reciprocal-throughput": 13.1975, "latency": 19.6563, "max-throughput": 7.57719e+07, "min-throughput": 5.08743e+07 } } Checked on i686-linux-gnu.
* math: Optimized generic exp10f with wrappersPaul Zimmermann2020-06-193-33/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is inspired by expf and reuses its tables and internal functions. The error checks are inlined and errno setting is in separate tail called functions, but the wrappers are kept in this patch to handle the _LIB_VERSION==_SVID_ case. Double precision arithmetics is used which is expected to be faster on most targets (including soft-float) than using single precision and it is easier to get good precision result with it. Result for x86_64 (i7-4790K CPU @ 4.00GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.0414e+09, "iterations": 1.00128e+08, "reciprocal-throughput": 26.6818, "latency": 54.043, "max-throughput": 3.74787e+07, "min-throughput": 1.85038e+07 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.11951e+09, "iterations": 1.23968e+08, "reciprocal-throughput": 21.0581, "latency": 45.4028, "max-throughput": 4.74876e+07, "min-throughput": 2.20251e+07 } Result for aarch64 (A72 @ 2GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.62362e+09, "iterations": 3.3376e+07, "reciprocal-throughput": 127.698, "latency": 149.365, "max-throughput": 7.831e+06, "min-throughput": 6.69501e+06 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.29108e+09, "iterations": 6.6752e+07, "reciprocal-throughput": 51.2111, "latency": 77.3568, "max-throughput": 1.9527e+07, "min-throughput": 1.29271e+07 } Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu, and sparc64-linux-gnu.
* benchtests: Add exp10f benchmarkAdhemerval Zanella2020-06-192-1/+2390
| | | | | | It is based on expf one by converting each line with the formula: new_val = (float) log10 (exp ((double) old_val))
* x86: Update F16C detection [BZ #26133]H.J. Lu2020-06-182-3/+7
| | | | Since F16C requires AVX, set F16C usable only when AVX is usable.
* Fix avx2 strncmp offset compare condition check [BZ #25933]Sunil K Pandey2020-06-171-0/+15
| | | | | | | | | strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.
* nptl: Remove now-spurious tst-cancelx9 referencesSamuel Thibault2020-06-171-2/+1
| | | | | | | | They were to be moved to sysdeps/pthread/Makefile in 45fce058f ('htl: Enable more cancellation tests') * nptl/Makefile: (tests): Remove tst-cancelx9. (CFLAGS-tst-cancelx9.c): Remove.
* x86_64: Use %xmmN with vpxor to clear a vector registerH.J. Lu2020-06-172-3/+3
| | | | | Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.
* x86: Correct bit_cpu_CLFLUSHOPT [BZ #26128]H.J. Lu2020-06-171-1/+1
| | | | bit_cpu_CLFLUSHOPT should be (1u << 23), not (1u << 22).