mirror/glibc - mirror of git://sourceware.org/git/glibc.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	sparc: remove ceil, floor, trunc sparc specific implementations	Aurelien Jarno	2016-08-02	46	-2245/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ceil, floor and trunc functions on sparc do not fully follow the standard and trigger an inexact exception when presented a value which is not an integer. Since glibc 2.24 this causes a few tests to fail, for instance: testing double (without inline functions) Failure: ceil (lit_pi): Exception "Inexact" set Failure: ceil (-lit_pi): Exception "Inexact" set Failure: ceil (min_subnorm_value): Exception "Inexact" set Failure: ceil (min_value): Exception "Inexact" set Failure: ceil (0.1): Exception "Inexact" set Failure: ceil (0.25): Exception "Inexact" set Failure: ceil (0.625): Exception "Inexact" set Failure: ceil (-min_subnorm_value): Exception "Inexact" set Failure: ceil (-min_value): Exception "Inexact" set Failure: ceil (-0.1): Exception "Inexact" set Failure: ceil (-0.25): Exception "Inexact" set Failure: ceil (-0.625): Exception "Inexact" set I tried to fix that by using the same strategy than used on other architectures, that is by saving the FSR register at the beginning and restoring it at the end of the function. When doing so I noticed a comment that this operation might be very costly, so I decided to do some benchmarks. The benchmarks below represent the time required to run each of the function 60 millions of times with different input value. I have done that in the basic V9 code, the VIS2 code, and using the default C implementation of the libc, for both sparc32 and sparc64, on a Niagara T1 based machine and an UltraSparc IIIi. Given I don't have access to a more recent machine), I haven't been able to test the VIS3 version. Also it should be noted that it doesn't make sense to do this benchmark for V8 or earlier as in that case we use the default C implementation. The results are available in the table below, the "+ fix" version correspond to the one saving and restoring the FSR. Niagara T1 / sparc32 -------------------- ceilf ceil floorf floor truncf trunc V9 19.10 22.48 19.10 22.48 16.59 19.27 V9 + fix 19.77 23.34 19.77 23.33 17.27 20.12 VIS2 16.87 19.62 16.87 19.62 VIS2 + fix 17.55 20.47 17.55 20.47 C impl 11.39 13.80 11.40 13.80 10.88 10.84 Niagara T1 / sparc64 -------------------- ceilf ceil floorf floor truncf trunc V9 18.14 22.23 18.14 22.23 15.64 19.02 V9 + fix 18.82 23.08 18.82 23.08 16.32 19.87 VIS2 15.92 19.37 15.92 19.37 VIS2 + fix 16.59 20.22 16.59 20.22 C impl 11.39 13.60 11.39 15.36 10.88 12.65 UltraSparc IIIi / sparc32 ------------------------- ceilf ceil floorf floor truncf trunc V9 4.81 7.09 6.61 11.64 4.91 7.05 V9 + fix 7.20 10.42 7.14 10.54 6.76 9.47 VIS2 4.81 7.03 4.76 7.13 VIS2 + fix 6.76 9.51 6.71 9.63 C impl 3.88 8.62 3.90 9.45 3.57 6.62 UltraSparc IIIi / sparc64 ------------------------- ceilf ceil floorf floor truncf trunc V9 3.48 4.39 3.48 4.41 3.01 3.85 V9 + fix 4.76 5.90 4.76 5.90 4.86 6.26 VIS2 2.95 3.61 2.95 3.61 VIS2 + fix 4.24 5.37 4.30 7.97 C impl 3.63 4.89 3.62 6.38 3.33 4.03 The first thing that should be noted is that the C implementation is always faster on the Niagara T1 based machine. On the UltraSparc IIIi the float version on sparc32 is also faster. Coming back about the fix saving and restoring the FSR, it appears it has a big impact as expected. In that case the C implementation is always faster than the fixed implementations. This patch therefore removes the sparc specific implementations in favor of the generic ones. Changelog: * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile [$(subdir) = math] (libm-sysdep_routines): Remove. [$(subdir) = math && $(have-as-vis3) = yes] (libm-sysdep_routines): Remove s_ceilf-vis3, s_ceil-vis3, s_floorf-vis3, s_floor-vis3, s_truncf-vis3, s_trunc-vis3. * sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil-vis2.S: Delete file. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf-vis2.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor-vis2.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf-vis2.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf-vis3.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_ceil.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_ceilf.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_floor.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_floorf.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_trunc.S: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/s_truncf.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil-vis2.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf-vis2.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor-vis2.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf-vis2.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf-vis3.S: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_ceil.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_ceilf.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_floor.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_floorf.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_trunc.S: Likewise. * sysdeps/sparc/sparc64/fpu/s_truncf.S: Likewise.
*	Don't compile do_test with -mavx/-mavx/-mavx512	H.J. Lu	2016-07-27	11	-78/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't compile do_test with -mavx, -mavx nor -mavx512 since they won't run on non-AVX machines. [BZ #20384] * sysdeps/x86_64/fpu/Makefile (extra-test-objs): Add test-double-libmvec-sincos-avx-main.o, test-double-libmvec-sincos-avx2-main.o, test-double-libmvec-sincos-main.o, test-float-libmvec-sincosf-avx-main.o, test-float-libmvec-sincosf-avx2-main.o and test-float-libmvec-sincosf-main.o. test-float-libmvec-sincosf-avx512-main.o. ($(objpfx)test-double-libmvec-sincos): Also link with $(objpfx)test-double-libmvec-sincos-main.o. ($(objpfx)test-double-libmvec-sincos-avx): Also link with $(objpfx)test-double-libmvec-sincos-avx-main.o. ($(objpfx)test-double-libmvec-sincos-avx2): Also link with $(objpfx)test-double-libmvec-sincos-avx2-main.o. ($(objpfx)test-float-libmvec-sincosf): Also link with $(objpfx)test-float-libmvec-sincosf-main.o. ($(objpfx)test-float-libmvec-sincosf-avx): Also link with $(objpfx)test-float-libmvec-sincosf-avx2-main.o. [$(config-cflags-avx512) == yes] (extra-test-objs): Add test-double-libmvec-sincos-avx512-main.o and ($(objpfx)test-double-libmvec-sincos-avx512): Also link with $(objpfx)test-double-libmvec-sincos-avx512-main.o. ($(objpfx)test-float-libmvec-sincosf-avx512): Also link with $(objpfx)test-float-libmvec-sincosf-avx512-main.o. (CFLAGS-test-double-libmvec-sincos.c): Removed. (CFLAGS-test-float-libmvec-sincosf.c): Likewise. (CFLAGS-test-double-libmvec-sincos-main.c): New. (CFLAGS-test-double-libmvec-sincos-avx-main.c): Likewise. (CFLAGS-test-double-libmvec-sincos-avx2-main.c): Likewise. (CFLAGS-test-float-libmvec-sincosf-main.c): Likewise. (CFLAGS-test-float-libmvec-sincosf-avx-main.c): Likewise. (CFLAGS-test-float-libmvec-sincosf-avx2-main.c): Likewise. (CFLAGS-test-float-libmvec-sincosf-avx512-main.c): Likewise. (CFLAGS-test-double-libmvec-sincos-avx.c): Set to -DREQUIRE_AVX. (CFLAGS-test-float-libmvec-sincosf-avx.c ): Likewise. (CFLAGS-test-double-libmvec-sincos-avx2.c): Set to -DREQUIRE_AVX2. (CFLAGS-test-float-libmvec-sincosf-avx2.c ): Likewise. (CFLAGS-test-double-libmvec-sincos-avx512.c): Set to -DREQUIRE_AVX512F. (CFLAGS-test-float-libmvec-sincosf-avx512.c): Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Rewritten. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx-main.c: New file. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2-main.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512-main.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-main.c: Likewise.
*	Nios II localplt.data update: remove __eqsf2	Chung-Lin Tang	2016-07-27	1	-1/+0
\|
*	powerpc: Fix missing verb and typo in comment about AT_HWCAP entry	Gustavo Romero	2016-07-21	1	-2/+2
\| \| \| \| \|	Fix missing verb and typo in comment about AT_HWCAP entry, in the context of mcontext_t struct definition for PPC64 Linux kernels.
*	[AArch64] Update libm-test-ulps	Szabolcs Nagy	2016-07-21	1	-4/+4
\| \| \| \| \| \| \| \|	This partly reverts commit f8238ae3c7701dbd9c04028861916de64e578114 that regenerated the ulps, to make the max ulps good for gcc-5, gcc-6 and gcc-trunk as well. * sysdeps/aarch64/libm-test-ulps: Updated.
*	S390: Do not clobber r13 with memcpy on 31bit with copies >1MB.	Stefan Liebler	2016-07-20	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the default memcpy variant is called with a length of >1MB on 31bit, r13 is clobbered as the algorithm is switching to mvcle. The mvcle code returns without restoring r13. All other cases are restoring r13. If memcpy is called from outside libc the ifunc resolver will only select this variant if running on machines older than z10. Otherwise or if memcpy is called from inside libc, this default memcpy variant is called. The testcase timezone/tst-tzset is triggering this issue in some combinations of gcc versions and optimization levels. This bug was introduced in commit 04bb21ac93e90d7696bcaf8febe2b2dd2d83585a and thus is a regression compared to former glibc 2.23 release. This patch removes the usage of r13 at all. Thus it is not saved and restored. The base address for execute-instruction is now stored in r5 which is obtained after r5 is not needed anymore as 256byte block counter. ChangeLog: * sysdeps/s390/s390-32/memcpy.S (memcpy): Eliminate the usage of r13 as it is not restored in mvcle case.
*	microblaze: fix variable name collision with syscall macros	Mike Frysinger	2016-07-19	1	-21/+21
\| \| \| \| \| \| \| \| \|	If a function passes in a variable named "ret", the code will miscompile when it declares a local ret variable. In some cases, it's even a build failure like so: ../sysdeps/unix/sysv/linux/spawni.c: In function '__spawni_child': ../sysdeps/unix/sysv/linux/spawni.c:289:5: error: address of register variable 'ret' requested while (write_not_cancel (p, &ret, sizeof ret) < 0)
*	i386: Compile rtld-*.os with -mno-sse -mno-mmx -mfpmath=387	H.J. Lu	2016-07-18	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Compile i386 rtld-.os with -mno-sse -mno-mmx -mfpmath=387 so that no code in ld.so uses mm/xmm/ymm/zmm registers on i386 since the first 3 mm/xmm/ymm/zmm registers are used to pass vector parameters which must be preserved. sysdeps/i386/Makefile (rtld-CFLAGS): New. [subdir == elf] (CFLAGS-.os): Replace -mno-sse -mno-mmx -mfpmath=387 with $(rtld-CFLAGS). [subdir != elf] (CFLAGS-.os): Compile rtld-*.os with $(rtld-CFLAGS).
*	Fix cos computation for multiple precision fallback (bz #20357)	Siddhesh Poyarekar	2016-07-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the sincos consolidation I made two mistakes, one was a logical error due to which cos(0x1.8475e5afd4481p+0) returned sin(0x1.8475e5afd4481p+0) instead. The second issue was an error in negating inputs for the correct quadrants for sine. I could not find a suitable test case for this despite running a program to search for such an input for a couple of hours. Following patch fixes both issues. Tested on x86_64. Thanks to Matt Clay for identifying the issue. [BZ #20357] * sysdeps/ieee754/dbl-64/s_sin.c (sloww): Fix up condition to call __mpsin/__mpcos and to negate values. * math/auto-libm-test-in: Add test. * math/auto-libm-test-out: Regenerate.
*	[AArch64] Regenerate libm-test-ulps	Szabolcs Nagy	2016-07-18	1	-10/+10
\| \| \| \|	* sysdeps/aarch64/libm-test-ulps: Regenerated.
*	Refactor Linux raise implementation (BZ#15368)	Adhemerval Zanella	2016-07-13	3	-50/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes both the nptl and libc Linux raise implementation to avoid the issues described in BZ#15368. The strategy used is summarized in bug report first comment: 1. Block all signals (including internal NPTL ones); 2. Get pid and tid directly from syscall (not relying on cached values); 3. Call tgkill; 4. Restore old signal mask. Tested on x86_64 and i686. [BZ #15368] * sysdeps/unix/sysv/linux/nptl-signals.h (__nptl_clear_internal_signals): New function. (__libc_signal_block_all): Likewise. (__libc_signal_block_app): Likewise. (__libc_signal_restore_set): Likewise. * sysdeps/unix/sysv/linux/pt-raise.c (raise): Use Linux raise.c implementation. * sysdeps/unix/sysv/linux/raise.c (raise): Reimplement to not use the cached pid/tid value in pthread structure.
*	Regenerate i686 libm-test-ulps with GCC 6.1 at -O3 [BZ #20347]	H.J. Lu	2016-07-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes with GCC 6.1 and -O3 on i686: Failure: Test: j0_downward (0xap+0) Result: is: -2.45935813e-01 -0x1.f7ad32p-3 should be: -2.45935768e-01 -0x1.f7ad2cp-3 difference: 4.47034835e-08 0x1.800000p-25 ulp : 3.0000 max.ulp : 2.0000 Maximal error of `j0_downward' is : 3 ulp accepted: 2 ulp [BZ #20347] * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Regenerated.
*	x86-64: Add p{read,write}[v]64 to syscalls.list [BZ #20348]	H.J. Lu	2016-07-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	64-bit off_t in pread64, preadv, pwrite64 and pwritev syscalls is passed in one 64-bit register for both x32 and x86-64. Since the inline asm statement only passes long, which is 32-bit for x32, in registers, 64-bit off_t is truncated to 32-bit on x32. Since __ASSUME_PREADV and __ASSUME_PWRITEV are defined unconditionally, these syscalls can be implemented in syscalls.list to pass 64-bit off_t in one 64-bit register. Tested on x86-64 and x32 with off_t > 4GB on pread64/pwrite64 and preadv64/pwritev64. [BZ #20348] * sysdeps/unix/sysv/linux/x86_64/syscalls.list: Add pread64, preadv64, pwrite64 and pwritev64.
*	x86-64: Properly align stack in _dl_tlsdesc_dynamic [BZ #20309]	H.J. Lu	2016-07-12	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since _dl_tlsdesc_dynamic is called via PLT, we need to add 8 bytes for push in the PLT entry to align the stack. [BZ #20309] * configure.ac (have-mtls-dialect-gnu2): Set to yes if -mtls-dialect=gnu2 works. * configure: Regenerated. * elf/Makefile [have-mtls-dialect-gnu2 = yes] (tests): Add tst-gnu2-tls1. (modules-names): Add tst-gnu2-tls1mod. ($(objpfx)tst-gnu2-tls1): New. (tst-gnu2-tls1mod.so-no-z-defs): Likewise. (CFLAGS-tst-gnu2-tls1mod.c): Likewise. * elf/tst-gnu2-tls1.c: New file. * elf/tst-gnu2-tls1mod.c: Likewise. * sysdeps/x86_64/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Add 8 bytes for push in the PLT entry to align the stack.
*	X86-64: Define LO_HI_LONG to skip pos_h [BZ #20349]	H.J. Lu	2016-07-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define LO_HI_LONG to skip pos_h since it is ignored by kernel: static inline loff_t pos_from_hilo(unsigned long high, unsigned long low) { #define HALF_LONG_BITS (BITS_PER_LONG / 2) return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) \| low; } where size of loff_t == size of long. [BZ #20349] * sysdeps/unix/sysv/linux/x86_64/sysdep.h (LO_HI_LONG): New.
*	[AArch64] Add bits/hwcap.h for aarch64 linux	Szabolcs Nagy	2016-07-11	1	-0/+34
\| \| \| \| \| \| \| \|	AArch64 uses HWCAP bits but they are not defined in sys/auxv.h. This patch adds a copy of the linux v4.6 arm64 uapi asm/hwcap.h definitions. * sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h: New.
*	[AArch64] Fix libc internal asm profiling code	Szabolcs Nagy	2016-07-11	2	-2/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When glibc is built with --enable-profile, the ENTRY of asm functions includes CALL_MCOUNT for profiling. (matters for binaries static linked against libc_p.a.) CALL_MCOUNT did not save/restore argument registers around the _mcount call so it clobbered them. (it is enough to only save/restore the arguments passed to a given asm function, but that would be too many asm changes so it is simpler to always save all argument registers in this macro.) float args are not saved: mcount does not clobber the float regs and currently no asm function takes float arguments anyway. [BZ #18707] * sysdeps/aarch64/Makefile (CFLAGS-mcount.c): Add -mgeneral-regs-only. * sysdeps/aarch64/sysdep.h (CALL_MCOUNT): Save argument registers.
*	Fix LO_HI_LONG definition	Adhemerval Zanella	2016-07-08	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The p{read,write}v{64} consolidation patch [1] added a wrong guard for LO_HI_LONG definition. It currently uses both '__WORDSIZE == 64' and 'defined __ASSUME_WORDSIZE64_ILP32' to set the value to be passed in one argument, otherwise it will be split in two. However it fails on MIPS64n32 where syscalls n32 uses the compat implementation in the kernel meaning the off_t arguments are passed in two separate registers. GLIBC already defines a macro for such cases (__OFF_T_MATCHES_OFF64_T), so this patch uses it instead. Checked on x86_64, i686, x32, aarch64, armhf, and s390. * sysdeps/unix/sysv/linux/sysdep.h [__WORDSIZE == 64 \|\| __ASSUME_WORDSIZE64_ILP32] (LO_HI_LONG): Remove guards. * misc/tst-preadvwritev-common.c: New file. * misc/tst-preadvwritev.c: Use tst-preadvwritev-common.c. * misc/tst-preadvwritev64.c: Use tst-preadwritev-common.c and add a check for files larger than 2GB. [1] 4751bbe2ad4d1bfa05774e29376d553ecfe563b0
*	Remove __ASSUME_OFF_DIFF_OFF64 definition	Adhemerval Zanella	2016-07-08	9	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the __ASSUME_OFF_DIFF_OFF64 define introduced in p{read,write} consolidation patch. This define was added based on the idea 32 bits ports would continue to follow previous off{64}_t definition where off_t size differs from off64_t one. However, with recent AArch64/ILP32 patch submission and also with discussion for RISCV kernel interface, 32 bits ports now may aim to use off_t and off64_t with the same size as 64 bits. So current assumption for both p{read,write} and p{read,write}v are not compatible with new type definition. This patch now makes the syscall wrappers to only depend on __OFF_T_MATCHES_OFF64_T to define the default and 64-suffix variant, as follow: <function>.c #ifndef __OFF_T_MATCHES_OFF64_T /* build <function> / #endif and <function>64.c / build <function>64 / #ifdef __OFF_T_MATCHES_OFF64_T weak_alias (fallocate64, fallocate) #endif Tested on x86_64, i686, x32, and armhf. sysdeps/unix/sysv/linux/mips/kernel-features.h (__ASSUME_OFF_DIFF_OFF64): Remove define. * sysdeps/unix/sysv/linux/pread.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pread): Replace by __OFF_T_MATCHES_OFF64_T. * sysdeps/unix/sysv/linux/pread64.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pread64): Likewise. * sysdeps/unix/sysv/linux/preadv.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (preadv): Likewise. * sysdeps/unix/sysv/linux/preadv64.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (preadv64): Likewise. * sysdeps/unix/sysv/linux/pwrite.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pwrite): Likewise. * sysdeps/unix/sysv/linux/pwrite64.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pwrite64): Likewise. * sysdeps/unix/sysv/linux/pwritev.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pwritev): Likewise. * sysdeps/unix/sysv/linux/pwritev64.c [__WORDSIZE != 64 \|\| __ASSUME_OFF_DIFF_OFF64] (pwritev64): Likewise.
*	tile: only define __ASSUME_ALIGNED_REGISTER_PAIRS for 32-bit	Chris Metcalf	2016-07-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	The previous uses of this symbol were all in wordsize-32 code. In commit eeddfa91cbb1 ("Consolidate off_t/off64_t syscall argument passing") it was expanded to be used in pread/pwrite. Accordingly, we only define it in 32-bit compilation modes now. Both tilepro and tilegx32 follow this convention for the kernel ABI. tilegx64 follows it for passing 128-bit values, but there are no such ABIs in the kernel.
*	ppc: Fix modf (sNaN) for pre-POWER5+ CPU (bug 20240).	Aurelien Jarno	2016-07-08	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit a6a4395d fixed modf implementation by compiling s_modf.c and s_modff.c with -fsignaling-nans. However these files are also included from the pre-POWER5+ implementation, and thus these files should also be compiled with -fsignaling-nans. Changelog: [BZ #20240] * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc32.c): New variable. (CFLAGS-s_modff-ppc32.c): Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc64.c): Likewise. (CFLAGS-s_modff-ppc64.c): Likewise.
*	S390: Use DT_JUMPREL in prelink undo code.	Stefan Liebler	2016-07-06	3	-10/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On s390, the current prelink undo code in elf_machine_lazy_rel() has the requirement, that the plt stubs use the first got slots after the 3 reserved ones. In case of undoing prelink, the plt got slots are reset to the correct addresses whithin the corresponding plt-stub. Therefore the address is calculated by the address of the first plt-stub-address which was written by prelink (see l->l_mach.plt) to got[1] and index of current relocation multiplied with 32 (=size of one plt slot). The index was calculated with &current-got-slot - &got[3]. This patch removes the requirement, that the plt-got-slots are starting at got[3]. The index is now calculated with &current-reloc - &reloc[0]. The first struct Elf64_Rela is stored at DT_JMPREL. This patch is needed to prepare for partial relro support. Ulrich Weigand suggested this approach to use DT_JMPREL - Thanks. ChangeLog: * sysdeps/s390/linkmap.h (struct link_map_machine): Remove member gotplt and add member jmprel. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup): Setup member jmprel with DT_JMPREL instead of gotplt with &got[3]. (elf_machine_lazy_rel): Calculate address with reloc and jmprel. * sysdeps/s390/s390-64/dl-machine.h: Likewise.
*	hppa: Update libm-test-ulps.	John David Anglin	2016-07-06	1	-262/+436
\| \| \| \| \|	Changelog: * sysdeps/hppa/fpu/libm-test-ulps: Regenerate.
*	powerpc: Fix return code of strcasecmp for unaligned inputs	Rajalakshmi Srinivasaraghavan	2016-07-05	1	-3/+14
\| \| \| \| \| \| \|	If the input values are unaligned and if there are null characters in the memory before the starting address of the input values, strcasecmp gives incorrect return code. Fixed it by adding mask the bits that are not part of the string.
*	m68k: suppress -Wframe-address warning	Andreas Schwab	2016-07-04	1	-0/+4
\|
*	Treat STV_HIDDEN and STV_INTERNAL symbols as STB_LOCAL	Maciej W. Rozycki	2016-07-01	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a reference to PR ld/19908 make ld.so respect symbol export classes aka visibility and treat STV_HIDDEN and STV_INTERNAL symbols as local, preventing such symbols from preempting exported symbols. According to the ELF gABI[1] neither STV_HIDDEN nor STV_INTERNAL symbols are supposed to be present in linked binaries: "A hidden symbol contained in a relocatable object must be either removed or converted to STB_LOCAL binding by the link-editor when the relocatable object is included in an executable file or shared object." "An internal symbol contained in a relocatable object must be either removed or converted to STB_LOCAL binding by the link-editor when the relocatable object is included in an executable file or shared object." however some GNU binutils versions produce such symbols in some cases. PR ld/19908 is one and we also have this note in scripts/abilist.awk: so clearly there is linked code out there which contains such symbols which is prone to symbol table misinterpretation, and it'll be more productive if we handle this gracefully, under the Robustness Principle: "be liberal in what you accept, and conservative in what you produce", especially as this is a simple (STV_HIDDEN\|STV_INTERNAL) => STB_LOCAL mapping. References: [1] "System V Application Binary Interface - DRAFT - 24 April 2001", The Santa Cruz Operation, Inc., "Symbol Table", <http://www.sco.com/developers/gabi/2001-04-24/ch4.symtab.html> * sysdeps/generic/ldsodefs.h (dl_symbol_visibility_binds_local_p): New inline function. * elf/dl-addr.c (determine_info): Treat hidden and internal symbols as local. * elf/dl-lookup.c (do_lookup_x): Likewise. * elf/dl-reloc.c (RESOLVE_MAP): Likewise.
*	SPARC: fix nearbyint on sNaN input	Aurelien Jarno	2016-07-01	8	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nearbyint and nearbyintf should not trigger inexact exceptions, but should still trigger an invalid exception for a sNaN input. The SPARC specific implementations of these functions save the FSR at the beginning of the function and restore it at the end to not trigger an inexact exception. This however doesn't work for an sNaN input which need to trigger an invalid exception. Fix that by adding a fcmp instruction using the input value before saving FSR, so that an invalid exception is triggered for a sNaN input. This fixes the math/test-nearbyint-except test on SPARC. Changelog: * sparc/sparc32/sparcv9/fpu/s_nearbyint.S (__nearbyint): Trigger an invalid exception for a sNaN input. * sparc/sparc32/sparcv9/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint-vis3.S (__nearbyint_vis3): Likewise * sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf-vis3.S (__nearbyintf_vis3): Likewise * sparc/sparc64/fpu/s_nearbyint.S (__nearbyint): Likewise. * sparc/sparc64/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sparc/sparc64/fpu/multiarch/s_nearbyint-vis3.S (__nearbyint_vis3): Likewise. * sparc/sparc64/fpu/multiarch/s_nearbyintf-vis3.S (__nearbyintf_vis3): Likewise.
*	Require binutils 2.24 to build x86-64 glibc [BZ #20139]	H.J. Lu	2016-07-01	28	-103/+136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used to save the first 8 vector registers, which only saves the lower 256 bits of vector register, for lazy binding. When it is called on AVX512 platform, the upper 256 bits of ZMM registers are clobbered. Parameters passed in ZMM registers will be wrong when the function is called the first time. This patch requires binutils 2.24, whose assembler can store and load ZMM registers, to build x86-64 glibc. Since mathvec library needs assembler support for AVX512DQ, we disable mathvec if assembler doesn't support AVX512DQ. [BZ #20139] * config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ... (HAVE_AVX512DQ_ASM_SUPPORT): This. * sysdeps/x86_64/configure.ac: Require assembler from binutils 2.24 or above. (HAVE_AVX512_ASM_SUPPORT): Removed. (HAVE_AVX512DQ_ASM_SUPPORT): New. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT check unconditional. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT. * sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: Likewise.
*	Fixed wrong vector sincos/sincosf ABI to have it compatible with	Andrew Senkevich	2016-07-01	32	-39/+2548
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	current vector function declaration "#pragma omp declare simd notinbranch", according to which vector sincos should have vector of pointers for second and third parameters. It is fixed with implementation as wrapper to version having second and third parameters as pointers. [BZ #20024] * sysdeps/x86/fpu/test-math-vector-sincos.h: New. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI of this implementation of vector function. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise. * sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise. * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise. * sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper for testing vector sincos with fixed ABI. * sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise. * sysdeps/x86_64/fpu/Makefile: Added new tests.
*	SPARC64: update localplt.data	Aurelien Jarno	2016-07-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commits d81f90cc and 89faa0340 replaced called to __isnan and __isinf by the corresponding GCC builtins. In turns GCC emits calls to _Qp_cmp. We should therefore add _Qp_cmp to localplt.data as otherwise the elf/check-localplt test fails with: Extra PLT reference: libc.so: _Qp_cmp A similar change has already been done for SPARC32 in commit 6ef1cb95. Changelog: * sysdeps/unix/sysv/linux/sparc/sparc64/localplt.data: Add _Qp_cmp.
*	powerpc: Add a POWER8-optimized version of sinf()	Anton Blanchard	2016-06-30	5	-1/+604
\| \| \| \| \|	This uses the implementation of sinf() in sysdeps/x86_64/fpu/s_sinf.S as inspiration.
*	powerpc: Add a POWER8-optimized version of expf()	Tulio Magno Quites Machado Filho	2016-06-30	6	-1/+388
\| \| \| \| \| \| \| \|	This implementation is based on the one already used at sysdeps/x86_64/fpu/e_expf.S. This implementation improves the performance by ~14% on average in synthetic benchmarks at the cost of decreasing accuracy to 1 ULP.
*	hppa: fix loading of global pointer in _start [BZ #20277]	John David Anglin	2016-06-30	1	-0/+2
\| \| \| \| \| \| \|	The patched change fixes a regression for executables compiled with the -p option and linked with gcrt1.o. The executables crash on startup. This regression was introduced in 2.22 and was noticed in the gcc testsuite.
*	Check Prefer_ERMS in memmove/memcpy/mempcpy/memset	H.J. Lu	2016-06-30	6	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove, memcpy, mempcpy and memset aren't used by the current processors, this patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so that they can be used in the future. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New. (index_arch_Prefer_ERMS): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return __memcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__memmove_erms): Enabled for libc.a. * ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return __memmove_erms or Prefer_ERMS. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return __mempcpy_erms for Prefer_ERMS. * sysdeps/x86_64/multiarch/memset.S (memset): Return __memset_erms for Prefer_ERMS.
*	i686/multiarch: Regenerate ulps	Aurelien Jarno	2016-06-30	1	-8/+8
\| \| \| \| \| \| \|	This comes from running “make regen-ulps” on AMD Opteron 6272 CPUs. Changelog: * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Regenerated.
*	Avoid array-bounds warning for strncat on i586 (bug 20260)	Andreas Schwab	2016-06-29	1	-2/+1
\|
*	MIPS: run tst-mode-switch-{1,2,3}.c using test-skeleton.c	Aurelien Jarno	2016-06-27	3	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some reasons I have not investigated yet, tst-mode-switch-1 hangs on a MIPS UTM-8 machine running an o32 userland and a 3.6.1 kernel. This patch changes the test so that it runs under the test-skeleton framework, causing the test to fail after a timeout instead of hanging the whole testsuite. At the same time, also change the tst-mode-switch-2 and tst-mode-switch-3 tests. Changelog: * sysdeps/mips/tst-mode-switch-1.c (main): Converted to ... (do_test): ... this. (TEST_FUNCTION): New macro. Include test-skeleton.c. * sysdeps/mips/tst-mode-switch-2.c (main): Likewise. * sysdeps/mips/tst-mode-switch-3.c (main): Likewise.
*	Avoid "inexact" exceptions in i386/x86_64 trunc functions (bug 15479).	Joseph Myers	2016-06-27	4	-23/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in <https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS 18661-1 disallows ceil, floor, round and trunc functions from raising the "inexact" exception, in accordance with general IEEE 754 semantics for when that exception is raised. Fixing this for x87 floating point is more complicated than for the other versions of these functions, because they use the frndint instruction that raises "inexact" and this can only be avoided by saving and restoring the whole floating-point environment. As I noted in <https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7, such that GCC will inline these functions on x86, without caring about "inexact", when the default -ffp-int-builtin-inexact is in effect. This allows users to get optimized code depending on the options they pass to the compiler, while making the out-of-line functions follow TS 18661-1 semantics and avoid "inexact". This patch duly fixes the out-of-line trunc function implementations to avoid "inexact", in the same way as the nearbyint implementations. I do not know how the performance of implementations such as these based on saving the environment and changing the rounding mode temporarily compares to that of the C versions or SSE 4.1 versions (of course, for 32-bit x86 SSE implementations still need to get the return value in an x87 register); it's entirely possible other implementations could be faster in some cases. Tested for x86_64 and x86. [BZ #15479] * sysdeps/i386/fpu/s_trunc.S (__trunc): Save and restore floating-point environment rather than just control word. * sysdeps/i386/fpu/s_truncf.S (__truncf): Likewise. * sysdeps/i386/fpu/s_truncl.S (__truncl): Save and restore floating-point environment, with "invalid" exceptions merged in, rather than just control word. * sysdeps/x86_64/fpu/s_truncl.S (__truncl): Likewise. * math/libm-test.inc (trunc_test_data): Do not allow spurious "inexact" exceptions.
*	Avoid "inexact" exceptions in i386/x86_64 floor functions (bug 15479).	Joseph Myers	2016-06-27	4	-23/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in <https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS 18661-1 disallows ceil, floor, round and trunc functions from raising the "inexact" exception, in accordance with general IEEE 754 semantics for when that exception is raised. Fixing this for x87 floating point is more complicated than for the other versions of these functions, because they use the frndint instruction that raises "inexact" and this can only be avoided by saving and restoring the whole floating-point environment. As I noted in <https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7, such that GCC will inline these functions on x86, without caring about "inexact", when the default -ffp-int-builtin-inexact is in effect. This allows users to get optimized code depending on the options they pass to the compiler, while making the out-of-line functions follow TS 18661-1 semantics and avoid "inexact". This patch duly fixes the out-of-line floor function implementations to avoid "inexact", in the same way as the nearbyint implementations. I do not know how the performance of implementations such as these based on saving the environment and changing the rounding mode temporarily compares to that of the C versions or SSE 4.1 versions (of course, for 32-bit x86 SSE implementations still need to get the return value in an x87 register); it's entirely possible other implementations could be faster in some cases. Tested for x86_64 and x86. [BZ #15479] * sysdeps/i386/fpu/s_floor.S (__floor): Save and restore floating-point environment rather than just control word. * sysdeps/i386/fpu/s_floorf.S (__floorf): Likewise. * sysdeps/i386/fpu/s_floorl.S (__floorl): Save and restore floating-point environment, with "invalid" exceptions merged in, rather than just control word. * sysdeps/x86_64/fpu/s_floorl.S (__floorl): Likewise. * math/libm-test.inc (floor_test_data): Do not allow spurious "inexact" exceptions.
*	Avoid "inexact" exceptions in i386/x86_64 ceil functions (bug 15479).	Joseph Myers	2016-06-27	4	-23/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in <https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS 18661-1 disallows ceil, floor, round and trunc functions from raising the "inexact" exception, in accordance with general IEEE 754 semantics for when that exception is raised. Fixing this for x87 floating point is more complicated than for the other versions of these functions, because they use the frndint instruction that raises "inexact" and this can only be avoided by saving and restoring the whole floating-point environment. As I noted in <https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7, such that GCC will inline these functions on x86, without caring about "inexact", when the default -ffp-int-builtin-inexact is in effect. This allows users to get optimized code depending on the options they pass to the compiler, while making the out-of-line functions follow TS 18661-1 semantics and avoid "inexact". This patch duly fixes the out-of-line ceil function implementations to avoid "inexact", in the same way as the nearbyint implementations. I do not know how the performance of implementations such as these based on saving the environment and changing the rounding mode temporarily compares to that of the C versions or SSE 4.1 versions (of course, for 32-bit x86 SSE implementations still need to get the return value in an x87 register); it's entirely possible other implementations could be faster in some cases. Tested for x86_64 and x86. [BZ #15479] * sysdeps/i386/fpu/s_ceil.S (__ceil): Save and restore floating-point environment rather than just control word. * sysdeps/i386/fpu/s_ceilf.S (__ceilf): Likewise. * sysdeps/i386/fpu/s_ceill.S (__ceill): Save and restore floating-point environment, with "invalid" exceptions merged in, rather than just control word. * sysdeps/x86_64/fpu/s_ceill.S (__ceill): Likewise. * math/libm-test.inc (ceil_test_data): Do not allow spurious "inexact" exceptions.
*	MIPS, SPARC: more fixes to the vfork aliases in libpthread.so	Aurelien Jarno	2016-06-27	3	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 43c29487 tried to fix the vfork aliases in libpthread.so on MIPS and SPARC, but failed to do it correctly, introducing an ABI change. This patch does the remaining changes needed to align the MIPS and SPARC vfork implementations with the other architectures. That way the the alpha version of pt-vfork.S works correctly for MIPS and SPARC. The changes for alpha were done in 82aab97c. Changelog: * sysdeps/unix/sysv/linux/mips/vfork.S (__vfork): Rename into __libc_vfork. (__vfork) [IS_IN (libc)]: Remove alias. (__libc_vfork) [IS_IN (libc)]: Define as an alias. * sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
*	Remove atomic_compare_and_exchange_bool_rel.	Torvald Riegel	2016-06-24	9	-89/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	atomic_compare_and_exchange_bool_rel and catomic_compare_and_exchange_bool_rel are removed and replaced with the new C11-like atomic_compare_exchange_weak_release. The concurrent code in nscd/cache.c has not been reviewed yet, so this patch does not add detailed comments. * nscd/cache.c (cache_add): Use new C11-like atomic operation instead of atomic_compare_and_exchange_bool_rel. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise. * include/atomic.h (atomic_compare_and_exchange_bool_rel, catomic_compare_and_exchange_bool_rel): Remove. * sysdeps/aarch64/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/alpha/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/arm/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/mips/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/tile/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise.
*	Fix i386/x86_64 scalbl with sNaN input (bug 20296).	Joseph Myers	2016-06-23	2	-23/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The x86_64 and i386 versions of scalbl return sNaN for some cases of sNaN input and are missing "invalid" exceptions for other cases. This results from overly complicated code that either returns a NaN input, or discards both inputs when one is NaN and loads a NaN from memory. This patch fixes this by simplifying the code to add the arguments when either one is NaN. Tested for x86_64 and x86. [BZ #20296] * sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Add arguments when either argument is a NaN. * sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise. * math/libm-test.inc (scalb_test_data): Add sNaN tests.
*	Simplify x86 nearbyint functions.	Joseph Myers	2016-06-22	4	-16/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The i386 implementations of nearbyint functions, and x86_64 nearbyintl, contain code to mask the "inexact" exception. However, the fnstenv instruction has the effect of masking all exceptions, so this masking code has been redundant since fnstenv was added to those implementations (by commit 846d9a4a3acdb4939ca7bf6aed48f9f6f26911be; commit 71d1b0166b4ace0d804af2993b3815758b852efc added the test math/test-nearbyint-except-2.c that verifies these functions do work when called with "inexact" traps enabled); this patch removes the redundant code. Tested for x86_64 and x86. * sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Do not mask "inexact" exceptions after fnstenv. * sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise. * sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
*	Move sysdeps/generic/bits/hwcap.h to top-level bits/	Zack Weinberg	2016-06-22	1	-23/+0
\| \| \| \| \| \| \| \| \| \|	This file was added to sysdeps/generic/bits in 2012. This appears to have been an oversight, as the entire sysdeps/generic/bits directory was moved to the top level in 2005. Accordingly the generic bits/hwcap.h belongs there too. * sysdeps/generic/bits/hwcap.h: Moved to ... * bits/hwcap.h: Here.
*	This patch further tunes memcpy - avoid one branch for sizes 1-3,	Wilco Dijkstra	2016-06-22	1	-24/+32
\| \| \| \| \| \| \|	add a prefetch and improve small copies that are exact powers of 2. * sysdeps/aarch64/memcpy.S (memcpy): Further tuning for performance.
*	Fix p{readv,writev}{64} consolidation implementation	Adhemerval Zanella	2016-06-21	5	-13/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the p{readv,writev}{64} consolidation implementation from commits 4e77815 and af5fdf5. Different from pread/pwrite implementation, preadv/pwritev implementation does not require __ALIGNMENT_ARG because kernel syscall prototypes define the high and low part of the off_t, if it is the case, directly (different from pread/pwrite where the architecture ABI for passing 64-bit values must be in consideration for passsing the arguments). It also adds some basic tests for preadv/pwritev. Tested on x86_64, i686, and armhf. * misc/Makefile (tests): Add tst-preadvwritev and tst-preadvwritev64. * misc/tst-preadvwritev.c: New file. * misc/tst-preadvwritev64.c: Likewise. * sysdeps/unix/sysv/linux/preadv.c (preadv): Remove SYSCALL_LL{64} usage. * sysdeps/unix/sysv/linux/preadv64.c (preadv64): Likewise. * sysdeps/unix/sysv/linux/pwritev.c (pwritev): Likewise. * sysdeps/unix/sysv/linux/pwritev64.c (pwritev64): Likewise. * sysdeps/unix/sysv/linux/sysdep.h (LO_HI_LONG): New macro.
*	Added tests to ensure linkage through libmvec *_finite aliases which are	Andrew Senkevich	2016-06-20	26	-0/+307
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	defined in libmvec_nonshared.a (bug 19654). [BZ #19654] * sysdeps/x86_64/fpu/Makefile: Added new tests. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx-main.c: New. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx-mod.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx2-main.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx2-mod.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx512-main.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx512-mod.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-main.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias-mod.c: Likewise. * sysdeps/x86_64/fpu/test-double-libmvec-alias.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx-mod.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx2-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx2-mod.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx2.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx512-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx512-mod.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-avx512.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-main.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias-mod.c: Likewise. * sysdeps/x86_64/fpu/test-float-libmvec-alias.c: Likewise. * sysdeps/x86_64/fpu/test-libmvec-alias-mod.c: Likewise.
*	Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it	Wilco Dijkstra	2016-06-20	2	-2/+45
\| \| \| \| \| \| \| \|	is the fastest way to search for '\0'. Otherwise use memchr with an infinite size. This is 3x faster on benchtests for large sizes. Passes GLIBC tests. * sysdeps/aarch64/rawmemchr.S (__rawmemchr): New file. * sysdeps/aarch64/strlen.S (__strlen): Change to __strlen to avoid PLT.
*	This is an optimized memcpy/memmove for AArch64. Copies are split into 3 main	Wilco Dijkstra	2016-06-20	2	-452/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cases: small copies of up to 16 bytes, medium copies of 17..96 bytes which are fully unrolled. Large copies of more than 96 bytes align the destination and use an unrolled loop processing 64 bytes per iteration. In order to share code with memmove, small and medium copies read all data before writing, allowing any kind of overlap. All memmoves except for the large backwards case fall into memcpy for optimal performance. On a random copy test memcpy/memmove are 40% faster on Cortex-A57 and 28% on Cortex-A53. * sysdeps/aarch64/memcpy.S (memcpy): Rewrite of optimized memcpy and memmove. * sysdeps/aarch64/memmove.S (memmove): Remove memmove code (merged into memcpy.S).