mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc: Use generic e_expf	Adhemerval Zanella	2019-06-26	8	-383/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generic implementation is faster on both power8 and power9: POWER9: - sysdeps/ieee754/flt-32/e_expf.c "expf": { "workload-spec2017.wrf": { "duration": 5.1236e+09, "iterations": 7.53344e+08, "reciprocal-throughput": 5.9436, "latency": 7.65869, "max-throughput": 1.68248e+08, "min-throughput": 1.30571e+08 } } - sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S "expf": { "workload-spec2017.wrf": { "duration": 5.14429e+09, "iterations": 5.29248e+08, "reciprocal-throughput": 8.05372, "latency": 11.3863, "max-throughput": 1.24166e+08, "min-throughput": 8.78249e+07 } } Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
*	powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf	Adhemerval Zanella	2019-06-26	23	-547/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patches consolidates all the powerpc llround{f} implementations on the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/fpu/Makefile [$(subdir) == math] (CFLAGS-s_lround.c): New rule. * sysdeps/powerpc/powerpc32/fpu/s_llround.c (__llround): Add power5+ and fctidz optimization. * sysdeps/powerpc/powerpc32/fpu/s_lround.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_lround.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_llround-power6.c, CFLAGS-s_llround-power5+.c, CFLAGS-s_llround-ppc32.c, CFLAGS-s_lround-ppc32.c, CFLAGS-s_lround-power5+.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llroundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
*	Linux: Add nds32 specific syscalls to syscall-names.list	Vincent Chen	2019-06-26	2	-0/+7
\| \| \| \| \| \|	The nds32 creates two specific syscalls, udftrap and fp_udfiex_crtl, in kernel v5.0 and v5.2, respectively. Add these two syscalls to syscall-names.list.
*	Fix build warnings in nptl/tst-eintr1.c	Stefan Liebler	2019-06-26	2	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the gcc warnings seen with gcc 9.1 -O3 on s390x: tst-eintr1.c: In function ‘tf1’: tst-eintr1.c:46:1: error: no return statement in function returning non-void [-Werror=return-type] 46 \| } \| ^ tst-eintr1.c: In function ‘do_test’: tst-eintr1.c:57:17: error: unused variable ‘th’ [-Werror=unused-variable] 57 \| pthread_t th = xpthread_create (NULL, tf1, NULL); \| ^~ ChangeLog: * nptl/tst-eintr1.c (tf1): Add return statement. (do_test): Remove unused th variable.
*	Fix build warnings in locale/programs/ld-ctype.c	Stefan Liebler	2019-06-26	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the gcc warnings seen with gcc 9 -march>=z13 on s390x: programs/ld-ctype.c: In function ‘ctype_read’: programs/ld-ctype.c:1392:13: error: ‘wch’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1392 \| uint32_t wch; \| ^~~ programs/ld-ctype.c:1401:7: error: ‘seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1401 \| if (seq != NULL && seq->nbytes == 1) \| ^ programs/ld-ctype.c:1391:20: note: ‘seq’ was declared here 1391 \| struct charseq seq; \| ^~~ Both seq and wch are uninitialized if get_character fails. Thus we are now returning with an error. ChangeLog: locale/programs/ld-ctype.c (charclass_symbolic_ellipsis): Return error if get_character fails.
*	S390: Regenerate ULPs.	Stefan Liebler	2019-06-25	2	-6/+10
\| \| \| \| \| \| \| \|	The update is needed for builds with -O3 and -march>=z13. ChangeLog: * sysdeps/s390/fpu/libm-test-ulps: Regenerated.
*	szl_PL locale: Fix a typo in the previous commit (bug 24652).	Rafal Luzynski	2019-06-24	2	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	The Unicode sequences in the format <Uxxxx> should be used instead of non-ASCII characters. Reported by Piotr Drąg: https://sourceware.org/bugzilla/show_bug.cgi?id=24652#c8 [BZ #24652] * localedata/locales/szl_PL (day): Use the correct Unicode sequences instead of non-ASCII characters.
*	szl_PL locale: Spelling corrections (bug 24652).	Grzegorz Kulik	2019-06-24	2	-15/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit also provides the correct month names in both nominative and genitive case for Silesian language, as required by the fix for the bug 10871. [BZ #24652] * localedata/locales/szl_PL (abday): Spelling corrections. (day): Likewise. (abmon): Likewise. (mon): Rename to... (alt_mon): This, then apply spelling corrections. (mon): New entry, month names in the genitive case.
*	nl_{AW,NL}: Correct the thousands separator and grouping (bug 23831).	Rafal Luzynski	2019-06-21	3	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \|	According to CLDR 35.1 and the bug report the thousands grouping separator should be always "." (a single dot) and digits should be grouped by 3. [BZ #23831] * localedata/locales/nl_AW (mon_thousands_sep): Set to ".". * localedata/locales/nl_NL (mon_thousands_sep): Likewise. (thousands_sep): Likewise. (grouping): Set to 3;3.
*	Add missing VDSO_{NAME,HASH}_* macros and use them for PREPARE_VERSION_KNOWN	Tobias Klauser	2019-06-21	9	-11/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define all currently used Linux versions used for PREPARE_VERSION{,_KNOWN} in sysdeps/unix/sysv/linux/dl-vdso.h and use them instead of duplicating the versions and precomputed hashes across architecture specific files. * sysdeps/unix/sysv/linux/aarch64/gettimeofday.c (INIT_ARCH): Use PREPARE_VERSION_KNOWN. * sysdeps/unix/sysv/linux/aarch64/init-first.c: Likewise. * sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME_LINUX_2_6_39): New define. (VDSO_HASH_LINUX_2_6_39): Likewise. (VDSO_NAME_LINUX_4_9): Likewise. (VDSO_HASH_LINUX_4_9): Likewise. * sysdeps/unix/sysv/linux/powerpc/gettimeofday.c (INIT_ARCH): Likewise. * sysdeps/unix/sysv/linux/powerpc/init-first.c (_libc_vdso_platform_setup): Likewise. * sysdeps/unix/sysv/linux/powerpc/time.c (INIT_ARCH): Likewise. * sysdeps/unix/sysv/linux/s390/init-first.c (_libc_vdso_platform_setup): Likewise. * sysdeps/unix/sysv/linux/x86_64/init-first.c (__vdso_platform_setup): Likewise. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	nptl: Convert various tests to use libsupport	Mike Crowe	2019-06-21	10	-409/+126
\| \| \| \| \| \| \| \| \| \| \|	* nptl/eintr.c: Use libsupport. * nptl/tst-eintr1.c: Likewise. * nptl/tst-eintr2.c: Likewise. * nptl/tst-eintr3.c: Likewise. * nptl/tst-eintr4.c: Likewise. * nptl/tst-eintr5.c: Likewise. * nptl/tst-mutex-errorcheck.c: Likewise. * nptl/tst-mutex5.c: Likewise.
*	support: Invent verbose_printf macro	Mike Crowe	2019-06-21	2	-0/+10
\| \| \| \| \| \| \|	Make it easier for tests to emit progress messages only when --verbose is specified. * support/test-driver.h: Add verbose_printf macro.
*	support: Add xclock_now helper function.	Mike Crowe	2019-06-21	2	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	It's easier to read and write tests with: const struct timespec ts = xclock_now(CLOCK_REALTIME); than struct timespec ts; xclock_gettime(CLOCK_REALTIME, &ts); * support/xtime.h: Add xclock_now() helper function.
*	libio: do not attempt to free wide buffers of legacy streams [BZ #24228]	Dmitry V. Levin	2019-06-20	5	-5/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a601b74d31ca086de38441d316a3dee24c866305 aka glibc-2.23~693 ("In preparation for fixing BZ#16734, fix failure in misc/tst-error1-mem when _G_HAVE_MMAP is turned off.") introduced a regression: _IO_unbuffer_all now invokes _IO_wsetb to free wide buffers of all files, including legacy standard files which are small statically allocated objects that do not have wide buffers and the _mode member, causing memory corruption. Another memory corruption in _IO_unbuffer_all happens when -1 is assigned to the _mode member of legacy standard files that do not have it. [BZ #24228] * libio/genops.c (_IO_unbuffer_all) [SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)]: Do not attempt to free wide buffers and access _IO_FILE_complete members of legacy libio streams. * libio/tst-bz24228.c: New file. * libio/tst-bz24228.map: Likewise. * libio/Makefile [build-shared] (tests): Add tst-bz24228. [build-shared] (generated): Add tst-bz24228.mtrace and tst-bz24228.check. [run-built-tests && build-shared] (tests-special): Add $(objpfx)tst-bz24228-mem.out. (LDFLAGS-tst-bz24228, tst-bz24228-ENV): New variables. ($(objpfx)tst-bz24228-mem.out): New rule.
*	[powerpc] add 'volatile' to asm	Paul A. Clarke	2019-06-19	3	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add 'volatile' keyword to a few asm statements, to force the compiler to generate the instructions therein. Some instances were implicitly volatile, but adding keyword for consistency. 2019-06-19 Paul A. Clarke <pc@us.ibm.com> * sysdeps/powerpc/fpu/fenv_libc.h (relax_fenv_state): Add 'volatile'. * sysdeps/powerpc/fpu/fpu_control.h (__FPU_MFFS): Likewise. (__FPU_MFFSL): Likewise. (_FPU_SETCW): Likewise.
*	powerpc: Fix static-linked version of __ppc_get_timebase_freq [BZ #24640]	Stan Shebs	2019-06-19	4	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	__ppc_get_timebase_freq() always return 0 when using static linked glibc. This is a minimal example.c to reproduce: /****************************/ #include <inttypes.h> #include <stdint.h> #include <stdio.h> #include <sys/platform/ppc.h> int main() { uint64_t freq = __ppc_get_timebase_freq(); printf("Time Base frequency = %"PRIu64" Hz\n", freq); if (freq == 0) return -1; return 0; } /***************************/ Compile command: gcc -static example.c This bug has been reproduced, fixed and tested on all powerpc platforms (ppc32, ppc64 and ppc64le). The underlying code of __ppc_get_timebase_freq uses __get_timebase_freq that has a different implementation for shared and static version of glibc. In the static version, there is an incorrect sense in the if check for the fd returned when opening /proc/cpuinfo. This solution is mostly a cherry-pick from: commit 4791e4f773d060c1a37b27aac5b03cdfa9327afc Author: Stan Shebs <stanshebs@google.com> Date: Fri May 17 12:25:19 2019 -0700 Subject: Fix sense of a test in the static-linking version of ppc get_clockfreq That is in branch glibc/google/grte/v5-2.27/master and was mentioned for inclusion on master here: https://www.sourceware.org/ml/libc-alpha/2019-05/msg00409.html Adapted from original fix for get_clockfreq. That code was moved to get_timebase_freq. Also added a static-build testcase for __ppc_get_timebase_freq since the underlying function has different implementations for shared and static build. [BZ #24640] sysdeps/unix/sysv/linux/powerpc/get_timebase_freq.c [!SHARED] (__get_timebase_freq): Fix sense of a test in the static-linking version. * sysdeps/unix/sysv/linux/powerpc/Makefile (tests-static): Add test-gettimebasefreq-static. (tests): Likewise. * sysdeps/unix/sysv/linux/powerpc/test-gettimebasefreq-static.c: New file.
*	nl_AW locale: Correct the negative monetary format (bug 24614).	Rafal Luzynski	2019-06-19	2	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Follow the same changes as made in the commit 02d8b5ab1c because the respective entries in nl_NL and nl_AW had been the same before the change so they should be the same after. CLDR does not provide complete data for nl_AW, it says it is missing and displays a copy of nl_NL. [BZ #24614] * localedata/locales/nl_AW (n_sep_by_space): Set to 2 (a space between the currency symbol and the minus sign). (n_sign_posn): Set to 4 (the minus sign after the currency symbol).
*	Fix gcc 9 build errors for make xcheck. [BZ #24556]	Stefan Liebler	2019-06-19	7	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following gcc 9 warnings for "make xcheck" / "make bench": -string/tst-strcasestr.c: ../include/bits/../../misc/bits/error.h:42:5: error: ‘%s’ directive argument is null [-Werror=format-overflow=] -argp/argp-test.c: argp-test.c:130:20: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=] argp-test.c:130:19: note: directive argument in the range [-2147483648, 122] argp-test.c:130:5: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 10 -nss/tst-field.c: tst-field.c:52:7: error: ‘%s’ directive argument is null [-Werror=format-overflow=] -benchtests/bench-strstr.c: ../include/bits/../../misc/bits/error.h:42:5: error: ‘%s’ directive argument is null [-Werror=format-overflow=] -benchtests/bench-malloc-simple.c: bench-malloc-simple.c:93:16: error: iteration 3 invokes undefined behavior [-Werror=aggressive-loop-optimizations] ChangeLog: [BZ #24556] * string/test-strcasestr.c (check_result): Add NULL check. * nss/tst-field.c (check_rewrite): Likewise. * benchtests/bench-strstr.c (do_one_test): Likewise. * string/test-strstr.c (check_result): Likewise. * argp/argp-test.c (popt): Increase size of buf to 12. * benchtests/bench-malloc-simple.c (bench): Do not initialize tests array out of bounds.
*	dlfcn: Avoid one-element flexible array in Dl_serinfo [BZ #24166]	Florian Weimer	2019-06-19	2	-0/+18
\| \| \| \| \| \| \| \| \|	The dls_serpath path field, as an array of length 1, introduces unexpected array subscript checks with some compilers. GCC versions before 3.0 treat the nested anonymous union as a declaration of an unnamed type, and not as a member declaration, so this construct cannot be used for these compilers.
*	elf: Refuse to dlopen PIE objects [BZ #24323]	Florian Weimer	2019-06-18	5	-6/+79
\| \| \| \| \|	Another executable has already been mapped, so the dynamic linker cannot perform relocations correctly for the second executable.
*	nl_NL locale: Correct the negative monetary format (bug 24614).	Rafal Luzynski	2019-06-17	4	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to CLDR 35.1 and the bug report the correct monetary format for negative amounts should be "EUR -1 234,56" while previously it was "EUR 1 234,56-". This patch does not change the thousands (grouping) separator. [BZ #24614] * localedata/Makefile (LOCALES): Add nl_NL.UTF-8. * localedata/locales/nl_NL (n_sep_by_space): Set to 2 (a space between the currency symbol and the minus sign). (n_sign_posn): Set to 4 (the minus sign after the currency symbol). * localedata/tst-strfmon1.c (tests): Add test data for nl_NL.UTF-8.
*	m68k: Remove vDSO support	Adhemerval Zanella	2019-06-17	10	-321/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although defined in initial TLS/NPTL ABI for m68k and ColdFire [1], kernel support was never pushed upstream. This patch removes the unused m68k vDSO support. Checked with a build against m68k and m68k-coldfire and some basic tests on ARAnyM. * sysdeps/unix/sysv/linux/m68k/Makefile (sysdep_routines, sysdep-rtld-routines): Remove rules. * sysdeps/unix/sysv/linux/m68k/Versions (libc) [GLIBC_PRIVATE]: Remove __vdso_atomic_cmpxchg_32 and __vdso_atomic_barrier. (ld) [GLIBC_PRIVATE]: __rtld___vdso_read_tp, __rtld___vdso_atomic_cmpxchg_32, and __rtld___vdso_atomic_barrier. * sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h (atomic_compare_and_exchange_val_acq, atomic_full_barrier): Remove vDSO path for SHARED. * sysdeps/unix/sysv/linux/m68k/init-first.c: Remove file. * sysdeps/unix/sysv/linux/m68k/libc-m68k-vdso.c: Likewise. * sysdeps/unix/sysv/linux/m68k/m68k-helpers.S: Likewise. * sysdeps/unix/sysv/linux/m68k/m68k-vdso.c: Likewise. * sysdeps/unix/sysv/linux/m68k/m68k-vdso.h: Likewise. * sysdeps/unix/sysv/linux/m68k/m68k-helpers.c: New file. [1] https://lists.debian.org/debian-68k/2007/11/msg00071.html
*	powerpc: Refactor powerpc64 lround/lroundf/llround/llroundf	Adhemerval Zanella	2019-06-17	31	-490/+240
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patches consolidates all the powerpc {l}lround{f} implementations on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llround-power8, s_llround-power6x, s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64. (CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c, CFLAGS-s_llround-power5+.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here. * sysdeps/powerpc/powerpc64/fpu/Makefile [$(subdir) == math] (CFLAGS-s_llround.c): New rule. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llround-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintf	Adhemerval Zanella	2019-06-17	23	-342/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_lrintf.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ... * sysdeps/powerpc/fpu/s_lrintf.c: ... here. * sysdeps/powerpc/powerpc32/fpu/Makefile [$(subdir) == math] (CFLAGS-s_lrint.c): New rule. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4 optimization. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c, CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c, CFLAGS-s_lrint-ppc32.c): New rule. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c: New file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintf	Adhemerval Zanella	2019-06-17	22	-225/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patches consolidates all the powerpc llrint{f} implementations on the generic sysdeps/powerpc/fpu/s_llrint{f}. The IFUNC support is also moved only to powerpc64 only, since for powerpc64le generic implementation resulting in optimized code. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and s_llrint-ppc64. (CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New file. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c: Likewise. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ... * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here. * sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove s_llrint-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	Linux: Fix __glibc_has_include use for <sys/stat.h> and statx	Florian Weimer	2019-06-14	2	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The identifier linux is used as a predefined macro, so the actually used path is 1/stat.h or 1/stat64.h. Using the quote-based version triggers a file lookup for /usr/include/bits/linux/stat.h (or whatever directory is used to store bits/statx.h), but since bits/ is pretty much reserved by glibc, this appears to be acceptable. This is related to GCC PR 80005: incorrect macro expansion of the argument of __has_include. Suggested by Zack Weinberg. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	<sys/cdefs.h>: Inhibit macro expansion for __glibc_has_include	Florian Weimer	2019-06-14	2	-1/+9
\| \| \| \| \| \| \| \| \|	This is currently ineffective with GCC because of GCC PR 80005, but it makes sense to anticipate a fix for this defect. Suggested by Zack Weinberg. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	Add IPV6_ROUTER_ALERT_ISOLATE from Linux 5.1 to bits/in.h.	Joseph Myers	2019-06-13	2	-0/+4
\| \| \| \| \| \| \| \| \| \|	This patch adds the new constant IPV6_ROUTER_ALERT_ISOLATE from Linux 5.1 to sysdeps/unix/sysv/linux/bits/in.h. Tested for x86_64. * sysdeps/unix/sysv/linux/bits/in.h (IPV6_ROUTER_ALERT_ISOLATE): New macro.
*	Allow memset local PLT reference for powerpc soft-float.	Joseph Myers	2019-06-13	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some recent change on GCC mainline resulted in the localplt test failing for powerpc soft-float (not sure exactly when, as the failure appeared when there were other build test failures as well; <https://sourceware.org/ml/libc-testresults/2019-q2/msg00261.html> shows it remaining when other failures went away). The problem is a call to memset that GCC now generates in the libgcc long double code. Since memset is documented as a function GCC may always implicitly generate calls to, it seems reasonable to allow that local PLT reference (just like those for libgcc functions that GCC implicitly generates calls to and that are also exported from libc.so), which this patch does. Tested for powerpc soft-float with build-many-glibcs.py. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/localplt.data: Allow memset in libc.so.
*	aarch64: handle STO_AARCH64_VARIANT_PCS	Szabolcs Nagy	2019-06-13	4	-4/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid lazy binding of symbols that may follow a variant PCS with different register usage convention from the base PCS. Currently the lazy binding entry code does not preserve all the registers required for AdvSIMD and SVE vector calls. Saving and restoring all registers unconditionally may break existing binaries, even if they never use vector calls, because of the larger stack requirement for lazy resolution, which can be significant on an SVE system. The solution is to mark all symbols in the symbol table that may follow a variant PCS so the dynamic linker can handle them specially. In this patch such symbols are always resolved at load time, not lazily. So currently LD_AUDIT for variant PCS symbols are not supported, for that the _dl_runtime_profile entry needs to be changed e.g. to unconditionally save/restore all registers (but pass down arg and retval registers to pltentry/exit callbacks according to the base PCS). This patch also removes a __builtin_expect from the modified code because the branch prediction hint did not seem useful. * sysdeps/aarch64/dl-dtprocnum.h: New file. * sysdeps/aarch64/dl-machine.h (DT_AARCH64): Define. (elf_machine_runtime_setup): Handle DT_AARCH64_VARIANT_PCS. (elf_machine_lazy_rel): Check STO_AARCH64_VARIANT_PCS and bind such symbols at load time. * sysdeps/aarch64/linkmap.h (struct link_map_machine): Add variant_pcs.
*	aarch64: add STO_AARCH64_VARIANT_PCS and DT_AARCH64_VARIANT_PCS	Szabolcs Nagy	2019-06-13	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking symbols that reference functions that may follow a variant PCS with different register usage convention from the base PCS. DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that have R__JUMP_SLOT relocations for symbols marked with STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT). elf/elf.h (STO_AARCH64_VARIANT_PCS): Define. (DT_AARCH64_VARIANT_PCS): Define.
*	powerpc: Remove optimized finite	Adhemerval Zanella	2019-06-12	20	-654/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The powerpc finite optimization do not show much gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc64-linux-gnu-power7 (--with-cpu=power7): - Generic sysdeps/ieee754 implementation: "isfinite": { "": { "duration": 5.0082e+09, "iterations": 2.45299e+09, "max": 43.824, "min": 2.008, "mean": 2.04167 }, "INF": { "duration": 4.66554e+09, "iterations": 2.28288e+09, "max": 35.73, "min": 2.008, "mean": 2.04371 }, "NAN": { "duration": 4.66274e+09, "iterations": 2.28716e+09, "max": 34.161, "min": 2.009, "mean": 2.03866 } } - power7 optimized one: "isfinite": { "": { "duration": 4.99111e+09, "iterations": 2.65566e+09, "max": 25.015, "min": 1.716, "mean": 1.87942 }, "INF": { "duration": 4.6783e+09, "iterations": 2.0999e+09, "max": 35.264, "min": 1.868, "mean": 2.22787 }, "NAN": { "duration": 4.67915e+09, "iterations": 2.08678e+09, "max": 38.099, "min": 1.869, "mean": 2.24228 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_finite* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_finite* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	math: Use wordsize-64 version for finite	Adhemerval Zanella	2019-06-12	3	-58/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- math.h will use compiler builtin for gcc 4.4 when built without -fsignaling-nans and the builtin is expanded inline for all support architectures. As an example, there is no intra finite call on libm for the architecture I checked, x86, arm, aarch64, and powerpc. - The resulting binary difference on 32 bits architecture is minimum for the non hotspot symbol. - It helps wordsize-64 architectures that use ldbl-opt. - It add some code simplification with reduction of duplicated implementations. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/ieee754/dbl-64/wordsize-64/s_finite.c: Move to ... * sysdeps/ieee754/dbl-64/s_finite.c: ... here and format code. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: Remove optimized isinf	Adhemerval Zanella	2019-06-12	20	-634/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The powerpc isinf optimizations onyl adds complexity: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power7 uses ftdiv to optimize for some input pattern and branch implementation for INF and denormal that does: return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000) Although it does show slight better latency than generic algorithm (as below), it is only for power7 and requires it to override it for power8. - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call): Remove s_isinf* and s_isinf* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	math: Use wordsize-64 version for isinf	Adhemerval Zanella	2019-06-12	3	-43/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- math.h will use compiler builtin for gcc 4.4 when built without -fsignaling-nans and the builtin is expanded inline for all support architectures. As an example, there is no intra isinf call on libm for the architecture I checked, x86, arm, aarch64, and powerpc. - The resulting binary difference on 32 bits architecture is minimum for the non hotspot symbol. - It helps wordsize-64 architectures that use ldbl-opt. - It add some code simplification with reduction of duplicated implementations. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/ieee754/dbl-64/wordsize-64/s_isinf.c: Move to ... * sysdeps/ieee754/dbl-64/s_isinf.c: ... here and format code. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: Remove optimized isnan	Adhemerval Zanella	2019-06-12	36	-1381/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The powerpc isnan optimizations are not really a gain: - GCC will call libm iff -fsignaling-nans is used. This usage pattern is usually not performance oriented and for such calls PLT overhead should dominate execution time. - The power5, power6, and power6x are just micro-optimization to improve the Load-Hit-Store hazards from floating-point to general register transfer, and current GCC already has support to minimize it by inserting either extra nops or group dispatch instructions. - The power7 uses ftdiv to optimize for some input patterns, but at cost of others. Comparing against generic C implementation built for powerpc-linux-gnu-power4 (which uses the hp-timing support on benchtests): - Generic sysdeps/ieee754 implementation: "isnan": { "": { "duration": 4.98415e+09, "iterations": 2.34516e+09, "max": 45.925, "min": 2.052, "mean": 2.12529 }, "INF": { "duration": 4.74057e+09, "iterations": 1.69761e+09, "max": 91.01, "min": 2.052, "mean": 2.79249 }, "NAN": { "duration": 4.74071e+09, "iterations": 1.68768e+09, "max": 282.343, "min": 2.052, "mean": 2.809 } } - power7 optimized one: $ ./testrun.sh benchtests/bench-isnan "isnan": { "": { "duration": 4.96842e+09, "iterations": 2.56297e+09, "max": 50.048, "min": 1.872, "mean": 1.93854 }, "INF": { "duration": 4.76648e+09, "iterations": 1.54213e+09, "max": 373.408, "min": 2.661, "mean": 3.09084 }, "NAN": { "duration": 4.76845e+09, "iterations": 1.54515e+09, "max": 51.016, "min": 2.736, "mean": 3.08607 } } So it basically optimizes marginally for normal numbers while increasing the latency for other kind of FP. - The generic implementation requires getting the floating point status, disable the invalid operation bit, and restore the floating-point status. Each operation is costly and requires flushing the FP pipeline. Using the same scenarion for the previous analysis: "isnan": { "": { "duration": 5.08284e+09, "iterations": 6.2898e+08, "max": 41.844, "min": 8.057, "mean": 8.08108 }, "INF": { "duration": 4.97904e+09, "iterations": 6.16176e+08, "max": 39.661, "min": 8.057, "mean": 8.08055 }, "NAN": { "duration": 4.98695e+09, "iterations": 5.95866e+08, "max": 29.728, "min": 8.345, "mean": 8.36925 } } - The power8 implementation is just the generic implementation using ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation). So generic implementation is the best option for powerpc64le. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_isnan.c: Remove file. * sysdeps/powerpc/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S: Remove file * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls): Remove s_isnan-* and s_isnanf-* objects. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise. * sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	math: Use wordsize-64 version for isnan	Adhemerval Zanella	2019-06-12	3	-51/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- math.h will use compiler builtin for gcc 4.4 when built without -fsignaling-nans and the builtin is expanded inline for all support architectures. As an example, there is no intra isnan call on libm for the architecture I checked, x86, arm, aarch64, and powerpc. - The resulting binary difference on 32 bits architecture is minimum for the non hotspot symbol. - It helps wordsize-64 architectures that use ldbl-opt. - It add some code simplification with reduction of duplicated implementations. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/ieee754/dbl-64/wordsize-64/s_isnan.c: Move to ... * sysdeps/ieee754/dbl-64/s_isnan.c: ... here and format code. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	benchtests: Add isnan/isinf/isfinite benchmark	Adhemerval Zanella	2019-06-12	5	-1/+74
\| \| \| \| \| \| \| \| \| \| \|	* benchtests/Makefile (bench-math): Add isnan, isinf, and isfinite. (CFLAGS-bench-isnan.c, CFLAGS-bench-isinf.c, CFLAGS-bench-isfinite.c): New rule. * benchtests/isnan-input: New file. * benchtests/isinf-input: New file. * benchtests/isfinite-input: New file. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: copysign cleanup	Adhemerval Zanella	2019-06-12	21	-504/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC always expand copysign{f} for all possible cpus, so calling the libm is only done if user explicitly states to disable the builtin (which is done usually not for performance reason). So to provide ifunc variant for copysign is just unrequired complexity, since libm will be called on non-performance critical code. This patch removes both powerpc32 and powerpc64 ifunc variants and consolidates the powerpc implementation on sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/s_copysign.c: New file. * sysdeps/powerpc/fpu/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and s_copysign-ppc32. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls): Remove s_copysign-power6 s_copysign-ppc64. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S: Remove file. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise. * sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	powerpc: consolidate rint	Adhemerval Zanella	2019-06-12	8	-293/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patches consolidates all the powerpc rint{f} implementations on the generic sysdeps/powerpc/fpu/s_rint{f}. Checked on powerpc-linux-gnu (built without --with-cpu, with --with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch), powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+ and --disable-multi-arch). * sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode, round_to_integer_float, round_mode): Add RINT handling. (reset_fenv_mode): New symbol. * sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation. * sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file. * sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise. Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
*	libio: freopen of default streams crashes in old programs [BZ #24632]	Florian Weimer	2019-06-12	3	-1/+12
\| \| \| \|	As seen with very old i386 GCC binaries.
*	Linux: Deprecate <sys/sysctl.h> and sysctl	Florian Weimer	2019-06-12	6	-35/+26
\| \| \| \| \| \| \| \| \| \|	Now that there are no internal users of __sysctl left, it is possible to add an unconditional deprecation warning to <sys/sysctl.h>. To avoid a test failure due this warning in check-install-headers, skip the test for sys/sysctl.h. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
*	<sys/stat.h>: Use Linux UAPI header for statx if available and useful	Florian Weimer	2019-06-12	11	-71/+219
\| \| \| \| \|	This will automatically import new STATX_* constants. It also avoids a conflict between <sys/stat.h> and <linux/stat.h>.
*	<sys/cdefs.h>: Add __glibc_has_include macro	Florian Weimer	2019-06-12	2	-0/+10
\|
*	Improve performance of memmem	Wilco Dijkstra	2019-06-12	2	-42/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch significantly improves performance of memmem using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 2 use a dedicated linear search. Very long needles use the Two-Way algorithm (to avoid increasing stack size or slowing down the common case, inlining is disabled). The performance gain is 6.6 times on English text on AArch64 using random needles with average size 8. Tested against GLIBC testsuite and randomized tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/memmem.c (__memmem): Rewrite to improve performance.
*	Improve performance of strstr	Wilco Dijkstra	2019-06-12	3	-51/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch significantly improves performance of strstr using a novel modified Horspool algorithm. Needles up to size 256 use a bad-character table indexed by hashed pairs of characters to quickly skip past mismatches. Long needles use a self-adapting filtering step to avoid comparing the whole needle repeatedly. By limiting the needle length to 256, the shift table only requires 8 bits per entry, lowering preprocessing overhead and minimizing cache effects. This limit also implies worst-case performance is linear. Small needles up to size 3 use a dedicated linear search. Very long needles use the Two-Way algorithm. The performance gain using the improved bench-strstr on Cortex-A72 is 5.8 times basic_strstr and 3.7 times twoway_strstr. Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test (https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> * string/str-two-way.h (two_way_short_needle): Add inline to avoid warning. (two_way_long_needle): Block inlining. * string/strstr.c (strstr2): Add new function. (strstr3): Likewise. (STRSTR): Completely rewrite strstr to improve performance.
*	Benchmark strstr hard needles	Wilco Dijkstra	2019-06-11	2	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Benchmark needles which exhibit worst-case performance. This shows that basic_strstr is quadratic and thus unsuitable for large needles. On the other hand the Two-way and new strstr implementations are linear with increasing needle sizes. The slowest cases of the two implementations are within a factor of 2 on several different microarchitectures. Two-way is slowest on inputs which cause a branch mispredict on almost every character. The new strstr is slowest on inputs which almost match and result in many calls to memcmp. Thanks to Szabolcs for providing various hard needles. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> * benchtests/bench-strstr.c (test_hard_needle): New function.
*	Fix malloc tests build with GCC 10.	Joseph Myers	2019-06-10	3	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC mainline has recently added warn_unused_result attributes to some malloc-like built-in functions, where glibc previously had them in its headers only for __USE_FORTIFY_LEVEL > 0. This results in those attributes being newly in effect for building the glibc testsuite, so resulting in new warnings that break the build where tests deliberately call such functions and ignore the result. Thus patch duly adds calls to DIAG_* macros around those calls to disable the warning. Tested with build-many-glibcs.py for aarch64-linux-gnu. * malloc/tst-calloc.c: Include <libc-diag.h>. (null_test): Ignore -Wunused-result around calls to calloc. * malloc/tst-mallocfork.c: Include <libc-diag.h>. (do_test): Ignore -Wunused-result around call to malloc.
*	Linux: Add getdents64 system call	Florian Weimer	2019-06-07	38	-9/+282
\| \| \| \| \| \| \| \| \| \| \|	No 32-bit system call wrapper is added because the interface is problematic because it cannot deal with 64-bit inode numbers and 64-bit directory hashes. A future commit will deprecate the undocumented getdirentries and getdirentries64 functions. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
*	[powerpc] get_rounding_mode: utilize faster method to get rounding mode	Paul A. Clarke	2019-06-06	3	-3/+68
\| \| \| \| \| \| \| \| \|	Add support to use 'mffsl' instruction if compiled for POWER9 (or later). Also, mask the result to avoid bleeding unrelated bits into the result of _FPU_GET_RC(). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>