| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
| |
This will be needed for rseq TCB access.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
| |
These are common between most architectures. Only the x86 targets
are outliers.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
| |
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.
Only architectures for which rseq support is added are covered.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
| |
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
|
|
|
|
| |
The powerpc is refactor to use the default implementation.
|
|
|
|
|
| |
It allows also to remove hppa specific implementation and simplify
riscv implementation a bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Programs without dynamic dependencies and without a program
interpreter are now run via execve.
Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.
<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
| |
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.
After this change:
$ readelf -wF libc.so.6 | grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6
00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c
LOC CFA r31 ra
000000000004b9d4 r1+0 u u
000000000004b9e4 r1+48 u u
000000000004b9e8 r1+48 c-16 u
000000000004b9fc r1+48 c-16 c+16
000000000004ba08 r1+48 c-16
000000000004ba18 r1+48 u
000000000004ba1c r1+0 u
libc.so.6: file format elf64-powerpcle
Disassembly of section .text:
000000000004b9d4 <kill>:
4b9d4: 1f 00 4c 3c addis r2,r12,31
4b9d8: 2c c3 42 38 addi r2,r2,-15572
4b9dc: 25 00 00 38 li r0,37
4b9e0: d1 ff 21 f8 stdu r1,-48(r1)
4b9e4: 20 00 e1 fb std r31,32(r1)
4b9e8: 98 8f ed eb ld r31,-28776(r13)
4b9ec: 10 00 ff 77 andis. r31,r31,16
4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38>
4b9f4: a6 02 28 7d mflr r9
4b9f8: 40 00 21 f9 std r9,64(r1)
4b9fc: 01 00 00 44 scv 0
4ba00: 40 00 21 e9 ld r9,64(r1)
4ba04: a6 03 28 7d mtlr r9
4ba08: 08 00 00 48 b 4ba10 <kill+0x3c>
4ba0c: 02 00 00 44 sc
4ba10: 00 00 bf 2e cmpdi cr5,r31,0
4ba14: 20 00 e1 eb ld r31,32(r1)
4ba18: 30 00 21 38 addi r1,r1,48
4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60>
4ba20: 01 f0 20 39 li r9,-4095
4ba24: 40 48 23 7c cmpld r3,r9
4ba28: 20 00 e0 4d bltlr+
4ba2c: d0 00 63 7c neg r3,r3
4ba30: 08 00 00 48 b 4ba38 <kill+0x64>
4ba34: 20 00 e3 4c bnslr+
4ba38: c8 32 fe 4b b 2ed00 <__syscall_error>
...
4ba44: 40 20 0c 00 .long 0xc2040
4ba48: 68 00 00 00 .long 0x68
4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3
4ba50: 6b 69 6c 6c xoris r12,r3,26987
|
|
|
|
|
|
|
|
|
| |
The syscall pipe2 was added in linux 2.6.27 and glibc requires linux
3.2.0. The patch removes the arch-specific implementation for alpha,
ia64, mips, sh, and sparc which requires a different kernel ABI
than the usual one.
Checked on x86_64-linux-gnu and with a build for the affected ABIs.
|
|
|
|
|
|
|
|
|
| |
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
| |
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list. Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.
HPPA is the only outlier which requires a different default
symbol.
Checked on x86_64-linux-gnu and with build for the affected ABIs.
|
|
|
|
|
|
|
| |
The /proc/statm fallback was removed by f13fb81ad3159 if sysfs is
not available, reinstate it.
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
|
|
| |
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture. This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
| |
For 32-bit architecture with __ASSUME_STATX there is no need to
build fstatat64_time64_stat.
Checked on i686-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector sin/sinf and input files to libmvec microbenchmark.
libmvec-sin-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-sinf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector pow/powf and input files to libmvec microbenchmark.
libmvec-pow-inputs:
arg1:
90% Normal random distribution
range: (0.0, 256.0)
mean: 0.0
sigma: 32.0
10% uniform random distribution in range (0.0, 256.0)
arg2:
90% Normal random distribution
range: (-127.0, 127.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-127.0, 127.0)
libmvec-powf-inputs:
arg1:
90% Normal random distribution
range: (0.0f, 100.0f)
mean: 0.0f
sigma: 16.0f
10% uniform random distribution in range (0.0f, 100.0f)
arg2:
90% Normal random distribution
range: (-10.0f, 10.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-10.0f, 10.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector log/logf and input files to libmvec microbenchmark.
libmvec-log-inputs:
70% Normal random distribution
range: (0.0, DBL_MAX)
mean: 1.0
sigma: 50.0
30% uniform random distribution in range (0.0, DBL_MAX)
libmvec-logf-inputs:
70% Normal random distribution
range: (0.0f, FLT_MAX)
mean: 1.0f
sigma: 50.0f
30% uniform random distribution in range (0.0f, FLT_MAX)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector exp/expf and input files to libmvec microbenchmark.
libmvec-exp-inputs:
90% Normal random distribution
range: (-708.0, 709.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-expf-inputs:
90% Normal random distribution
range: (-87.0f, 88.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vector cos/cosf and input files to libmvec microbenchmark.
libmvec-cos-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-cosf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that Hurd implementis both close_range and closefrom (f2c996597d),
we can make close_range() a base ABI, and make the default closefrom()
implementation on top of close_range().
The generic closefrom() implementation based on __getdtablesize() is
moved to generic close_range(). On Linux it will be overriden by
the auto-generation syscall while on Hurd it will be a system specific
implementation.
The closefrom() now calls close_range() and __closefrom_fallback().
Since on Hurd close_range() does not fail, __closefrom_fallback() is an
empty static inline function set by__ASSUME_CLOSE_RANGE.
The __ASSUME_CLOSE_RANGE also allows optimize Linux
__closefrom_fallback() implementation when --enable-kernel=5.9 or
higher is used.
Finally the Linux specific tst-close_range.c is moved to io and
enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are
guarded so it can be built for Hurd (I have not actually test it).
Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu
build.
|
|
|
|
|
|
|
|
|
|
|
|
| |
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel). This is observable with blocked pending
signals during thread exit.
Fixes commit b3cae39dcbfa2432b3f3aa28854d8ac57f0de1b8
("nptl: Start new threads with all signals blocked [BZ #25098]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).
This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It requires less boilerplate code for newer ports. The _Static_assert
checks from internal setjmp are moved to its own internal test since
setjmp.h is included early by multiple headers (to generate
rtld-sizes.sym).
The riscv jmp_buf-macros.h check is also redundant, it is already
done by riscv configure.ac.
Checked with a build for the affected architectures.
|
|
|
|
|
|
|
|
| |
This patch updates the kernel version in the test tst-mman-consts.py
to 5.15. (There are no new MAP_* constants covered by this test in
5.15 that need any other header changes.)
Tested with build-many-glibcs.py.
|
|
|
|
|
|
|
| |
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
these constants to bits/socket.h.
Tested for x86_64.
|
|
|
|
|
|
|
| |
This will be used to deallocate memory allocated using the non-minimal
malloc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And make it an installed header. This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.
The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add python script to generate libmvec microbenchmark from the input
values for each libmvec function using skeleton benchmark template.
Creates double and float benchmarks with vector length 1, 2, 4, 8,
and 16 for each libmvec function. Vector length 1 corresponds to
scalar version of function and is included for vector function perf
comparison.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No bug.
This implementation refactors memcmp-sse4.S primarily with minimizing
code size in mind. It does this by removing the lookup table logic and
removing the unrolled check from (256, 512] bytes.
memcmp-sse4 code size reduction : -3487 bytes
wmemcmp-sse4 code size reduction: -1472 bytes
The current memcmp-sse4.S implementation has a large code size
cost. This has serious adverse affects on the ICache / ITLB. While
in micro-benchmarks the implementations appears fast, traces of
real-world code have shown that the speed in micro benchmarks does not
translate when the ICache/ITLB are not primed, and that the cost
of the code size has measurable negative affects on overall
application performance.
See https://research.google/pubs/pub48320/ for more details.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux 5.15 has one new syscall, process_mrelease (and also enables the
clone3 syscall for RV32). It also has a macro __NR_SYSCALL_MASK for
Arm, which is not a syscall but matches the pattern used for syscall
macro names.
Add __NR_SYSCALL_MASK to the names filtered out in the code dealing
with syscall lists, update syscall-names.list for the new syscall and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.
Tested with build-many-glibcs.py.
|
|
|
|
|
|
|
|
|
|
| |
Depending on the layout chosen by the linker, the 16-bit displacement
of the jh instruction is insufficient to reach the target label.
Analysis of the linker failure was carried out by Nick Clifton.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since
commit d73f5331ce5370ca5a879229e3842f5de98689cd
Author: Roland McGrath <roland@gnu.org>
Date: Fri May 2 02:20:45 2003 +0000
2003-05-01 Roland McGrath <roland@redhat.com>
dependency is generated by passing -MD -MF to compiler. Remove the unused
+mkdep, +make-deps, s-proto.S and s-proto-cancel.S.
This fixes BZ #28554.
|
|
|
|
|
|
|
|
|
|
| |
The include cleanup on dl-minimal.c removed too much for some
targets.
Also for Hurd, __sbrk is removed from localplt.data now that
tunables allocated memory through mmap.
Checked with a build for all affected architectures.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The rtld_malloc functions are moved to its own file so it can be
used on csu code. Also, the functiosn are renamed to __minimal_*
(since there are now used not only on loader code).
Using the __minimal_malloc on tunables_strdup() avoids potential
issues with sbrk() calls while processing the tunables (I see
sporadic elf/tst-dso-ordering9 on powerpc64le with different
tests failing due ASLR).
Also, using __minimal_malloc over plain mmap optimizes the memory
allocation on both static and dynamic case (since it will any unused
space in either the last page of data segments, avoiding mmap() call,
or from the previous mmap() call).
Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
| |
That was just cargo-culted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The close_range () function implements the same API as the Linux and
FreeBSD syscalls. It operates atomically and reliably. The specified
upper bound is clamped to the actual size of the file descriptor table;
it is expected that the most common use case is with last = UINT_MAX.
Like in the Linux syscall, it is also possible to pass the
CLOSE_RANGE_CLOEXEC flag to mark the file descriptors in the range
cloexec instead of acually closing them.
Also, add a Hurd version of the closefrom () function. Since unlike on
Linux, close_range () cannot fail due to being unuspported by the
running kernel, a fallback implementation is never necessary.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20211106153524.82700-1-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No bug.
This patch doubles the rep_movsb_threshold when using ERMS. Based on
benchmarks the vector copy loop, especially now that it handles 4k
aliasing, is better for these medium ranged.
On Skylake with ERMS:
Size, Align1, Align2, dst>src,(rep movsb) / (vec copy)
4096, 0, 0, 0, 0.975
4096, 0, 0, 1, 0.953
4096, 12, 0, 0, 0.969
4096, 12, 0, 1, 0.872
4096, 44, 0, 0, 0.979
4096, 44, 0, 1, 0.83
4096, 0, 12, 0, 1.006
4096, 0, 12, 1, 0.989
4096, 0, 44, 0, 0.739
4096, 0, 44, 1, 0.942
4096, 12, 12, 0, 1.009
4096, 12, 12, 1, 0.973
4096, 44, 44, 0, 0.791
4096, 44, 44, 1, 0.961
4096, 2048, 0, 0, 0.978
4096, 2048, 0, 1, 0.951
4096, 2060, 0, 0, 0.986
4096, 2060, 0, 1, 0.963
4096, 2048, 12, 0, 0.971
4096, 2048, 12, 1, 0.941
4096, 2060, 12, 0, 0.977
4096, 2060, 12, 1, 0.949
8192, 0, 0, 0, 0.85
8192, 0, 0, 1, 0.845
8192, 13, 0, 0, 0.937
8192, 13, 0, 1, 0.939
8192, 45, 0, 0, 0.932
8192, 45, 0, 1, 0.927
8192, 0, 13, 0, 0.621
8192, 0, 13, 1, 0.62
8192, 0, 45, 0, 0.53
8192, 0, 45, 1, 0.516
8192, 13, 13, 0, 0.664
8192, 13, 13, 1, 0.659
8192, 45, 45, 0, 0.593
8192, 45, 45, 1, 0.575
8192, 2048, 0, 0, 0.854
8192, 2048, 0, 1, 0.834
8192, 2061, 0, 0, 0.863
8192, 2061, 0, 1, 0.857
8192, 2048, 13, 0, 0.63
8192, 2048, 13, 1, 0.629
8192, 2061, 13, 0, 0.627
8192, 2061, 13, 1, 0.62
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No bug.
The optimizations are as follows:
1) Always align entry to 64 bytes. This makes behavior more
predictable and makes other frontend optimizations easier.
2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have
significant benefits in the case that:
0 < (dst - src) < [256, 512]
3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%]
improvement and for FSRM [-10%, 25%].
In addition to these primary changes there is general cleanup
throughout to optimize the aligning routines and control flow logic.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few places where only known numeric values are acceptable for
`asm` parameters, yet the constraint "i" is used. "i" can include
"symbolic constants whose values will be known only at assembly time or
later."
Use "n" instead of "i" where known numeric values are required.
Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
It allows build both glibc and tests with lld (Since lld does not
support R_RISCV_ALIGN linker relaxation).
Checked with a build for riscv32-linux-gnu-rv32imafdc-ilp32d and
riscv64-linux-gnu-rv64imafdc-lp64d.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang cannot assemble movzx in the AT&T dialect mode.
../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction
movzx (%rsi), %ecx
^~~~
Change movzx to movzbl, which follows the AT&T dialect and is used
elsewhere in the file.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
The lld linker does not support TLSDESC for arm. The have-arm-tls-desc
is a leftover of 56583289b1 to support NaCL.
Reviewed-by: Fangrui Song <maskray@google.com>
|