| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
The information is theoretically available via dl_iterate_phdr as
well, but that approach is very slow if there are many shared
objects.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@rehdat.com>
|
|
|
|
|
| |
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@rehdat.com>
|
|
|
|
|
|
|
|
| |
The comment indicates that --hash-style=both was used to maintain
compatibility with static dlopen, but we had many internal ABI
changes since then, so this compatiblity does not add value anymore.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
| |
Improve libmvec benchmark integration so that in future other
architectures may be able to run their libmvec benchmarks as well. This
now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The libmvec benchmarks print a message indicating that a certain CPU
feature is unsupported and exit prematurelyi, which breaks the JSON in
bench.out.
Handle this more elegantly in the bench makefile target by adding
support for an UNSUPPORTED exit status (77) so that bench.out continues
to have output for valid tests.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
|
| |
The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal
calls, which fails with EOVERFLOW if the file constains timestamps
beyond 2038.
Checked on i686-linux-gnu.
|
|
|
|
|
|
|
|
|
|
| |
__ehdr_start is already used in rltld.c:dl_main, and can serve the
same purpose as _begin. Besides tidying the code, using linker
defined section relative symbols rather than "-defsym _begin=0" better
reflects the intent of _dl_start_final use of _begin, which is to
refer to the load address of ld.so rather than absolute address zero.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
'get_fast_jitter' is meant to be used purely for performance
purposes. In all cases it's used it should be acceptable to get no
randomness (see default case). An example use case is in setting
jitter for retries between threads at a lock. There is a
performance benefit to having jitter, but only if the jitter can
be generated very quickly and ultimately there is no serious issue
if no jitter is generated.
The implementation generally uses 'HP_TIMING_NOW' iff it is
inlined (avoid any potential syscall paths).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
Copied from gnulib/lib/glob.c in order to fix rhbz 1982608
Also fixes swbz 25659
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Benchmark for testing pthread mutex locks performance with different
threads and critical sections.
The test configuration consists of 3 parts:
1. thread number
2. critical-section length
3. non-critical-section length
Thread number starts from 1 and increased by 2x until num of CPU cores
(nprocs). An additional over-saturation case (1.25 * nprocs) is also
included.
Critical-section is represented by a loop of shared do_filler(),
length can be determined by the loop iters.
Non-critical-section is similiar to the critical-section, except it's
based on non-shared do_filler().
Currently, adaptive pthread_mutex lock is tested.
|
|
|
|
|
|
| |
These are two missing spots initially done by 52a5fe70a2c77935.
Checked on i686-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
libraries (BZ #28868)
On _dl_map_object the underlying file is not opened in trace mode
(in other cases where the underlying file can't be opened,
_dl_map_object quits with an error). If there any missing libraries
being processed, they will not be considered on final nlist size
passed on _dl_sort_maps later in the function. And it is then used by
_dl_sort_maps_dfs on the stack allocated working maps:
222 /* Array to hold RPO sorting results, before we copy back to maps[]. */
223 struct link_map *rpo[nmaps];
224
225 /* The 'head' position during each DFS iteration. Note that we start at
226 one past the last element due to first-decrement-then-store (see the
227 bottom of above dfs_traversal() routine). */
228 struct link_map **rpo_head = &rpo[nmaps];
However while transversing the 'l_initfini' on dfs_traversal it will
still consider the l_faked maps and thus update rpo more times than the
allocated working 'rpo', overflowing the stack object.
As suggested in bugzilla, one option would be to avoid sorting the maps
for trace mode. However I think ignoring l_faked object does make
sense (there is one less constraint to call the sorting function), it
allows a slight less stack usage for trace, and it is slight simpler
solution.
The tests does trigger the stack overflow, however I tried to make
it more generic to check different scenarios or missing objects.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
| |
Checked on x86_64-linux-gnu.
|
| |
|
|
|
|
|
|
|
|
| |
Verify that:
1. A DT_RELR shared library without DT_NEEDED works.
2. A DT_RELR shared library without DT_VERNEED works.
3. A DT_RELR shared library without libc.so on DT_NEEDED works.
|
|
|
|
|
|
| |
With DT_RELR, there may be no relocations in DT_RELA/DT_REL and their
entry values are zero. Don't relocate DT_RELA/DT_REL and update the
combined relocation start address if their entry values are zero.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PIE and shared objects usually have many relative relocations. In
2017/2018, SHT_RELR/DT_RELR was proposed on
https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/GxjM0L-PBAAJ
("Proposal for a new section type SHT_RELR") and is a pre-standard. RELR
usually takes 3% or smaller space than R_*_RELATIVE relocations. The
virtual memory size of a mostly statically linked PIE is typically 5~10%
smaller.
---
Notes I will not include in the submitted commit:
Available on https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/relr
"pre-standard": even Solaris folks are happy with the refined generic-abi
proposal. Cary Coutant will apply the change
https://sourceware.org/pipermail/libc-alpha/2021-October/131781.html
This patch is simpler than Chrome OS's glibc patch and makes ELF_DYNAMIC_DO_RELR
available to all ports. I don't think the current glibc implementation
supports ia64 in an ELFCLASS32 container. That said, the style I used is
works with an ELFCLASS32 container for 64-bit machine if ElfW(Addr) is
64-bit.
* Chrome OS folks have carried a local patch since 2018 (latest version:
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-libs/glibc/files/local/glibc-2.32).
I.e. this feature has been battle tested.
* Android bionic supports 2018 and switched to DT_RELR==36 in 2020.
* The Linux kernel has supported CONFIG_RELR since 2019-08
(https://git.kernel.org/linus/5cf896fb6be3effd9aea455b22213e27be8bdb1d).
* A musl patch (by me) exists but is not applied:
https://www.openwall.com/lists/musl/2019/03/06/3
* rtld-elf from FreeBSD 14 will support DT_RELR.
I believe upstream glibc should support DT_RELR to benefit all Linux
distributions. I filed some feature requests to get their attention:
* Gentoo: https://bugs.gentoo.org/818376
* Arch Linux: https://bugs.archlinux.org/task/72433
* Debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996598
* Fedora https://bugzilla.redhat.com/show_bug.cgi?id=2014699
As of linker support (to the best of my knowledge):
* LLD support DT_RELR.
* https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-devel/binutils/files/
has a gold patch.
* GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=27923
Changes from the original patch:
1. Check the linker option, -z pack-relative-relocs, which add a
GLIBC_ABI_DT_RELR symbol version dependency on the shared C library if
it provides a GLIBC_2.XX symbol version.
2. Change make variale to have-dt-relr.
3. Rename tst-relr-no-pie to tst-relr-pie for --disable-default-pie.
4. Use TEST_VERIFY in tst-relr.c.
5. Add the check-tst-relr-pie.out test to check for linker generated
libc.so version dependency on GLIBC_ABI_DT_RELR.
6. Move ELF_DYNAMIC_DO_RELR before ELF_DYNAMIC_DO_REL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The EI_ABIVERSION field of the ELF header in executables and shared
libraries can be bumped to indicate the minimum ABI requirement on the
dynamic linker. However, EI_ABIVERSION in executables isn't checked by
the Linux kernel ELF loader nor the existing dynamic linker. Executables
will crash mysteriously if the dynamic linker doesn't support the ABI
features required by the EI_ABIVERSION field. The dynamic linker should
be changed to check EI_ABIVERSION in executables.
Add a glibc version, GLIBC_ABI_DT_RELR, to indicate DT_RELR support so
that the existing dynamic linkers will issue an error on executables with
GLIBC_ABI_DT_RELR dependency. When there is a DT_VERNEED entry with
libc.so on DT_NEEDED, issue an error if there is a DT_RELR entry without
GLIBC_ABI_DT_RELR dependency.
Support __placeholder_only_for_empty_version_map as the placeholder symbol
used only for empty version map to generate GLIBC_ABI_DT_RELR without any
symbols.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage
variables and hidden visibility variables in a shared object (ld.so)
need dynamic relocations (usually R_*_RELATIVE). PI (position
independent) in the macro name is a misnomer: a code sequence using GOT
is typically position-independent as well, but using dynamic relocations
does not meet the requirement.
Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new
ports will define PI_STATIC_AND_HIDDEN. Current ports defining
PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure
default.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
| |
These failures were caught while building glibc master for Fedora
Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse'
using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon
processor.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When audit modules are loaded, ld.so initialization is not yet
complete, and rtld_active () returns false even though ld.so is
mostly working. Instead, the static dlopen hook is used, but that
does not work at all because this is not a static dlopen situation.
Commit 466c1ea15f461edb8e3ffaf5d86d708876343bbf ("dlfcn: Rework
static dlopen hooks") moved the hook pointer into _rtld_global_ro,
which means that separate protection is not needed anymore and the
hook pointer can be checked directly.
The guard for disabling libio vtable hardening in _IO_vtable_check
should stay for now.
Fixes commit 8e1472d2c1e25e6eabc2059170731365f6d5b3d1 ("ld.so:
Examine GLRO to detect inactive loader [BZ #20204]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
| |
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On non-PI_STATIC_AND_HIDDEN architectures, getting the address of
_rtld_local_ro (for GLRO (dl_final_object)) goes through a GOT entry.
The GOT load may be reordered before self relocation, leading to an
unrelocated/incorrect _rtld_local_ro address.
84e02af1ebc9988126eebe60bf19226cea835623 tickled GCC powerpc32 to
reorder the GOT load before relative relocations, leading to ld.so
crash. This is similar to the m68k jump table reordering issue fixed by
a8e9b5b8079d18116ca69c9797e77804ecf2ee7e.
Move code after self relocation into _dl_start_final to avoid the
reordering. This fixes powerpc32 and may help other architectures when
ELF_DYNAMIC_RELOCATE is simplified in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify
checks should pass, and `__whatever_alias` should be called.
Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but
on commit a643f60c53876b, this was moved into `__glibc_safe_or_unknown_len`.
A comment says the -1 case should work as: "The -1 check is redundant because
since it implies that __glibc_safe_len_cond is true.". But this fails when:
* `__s > 1`
* `__osz == -1` (i.e. unknown size at compile time)
* `__l` is big enough
* `__l * __s <= __osz` can be folded to a constant
(I only found this to be true for `mbsrtowcs` and other functions in wchar2.h)
In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be
called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program.
This commit adds the explicit `__osz == -1` check again.
moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041
Minimal test case (test.c):
#include <wchar.h>
int main (void)
{
const char *hw = "HelloWorld";
mbsrtowcs (NULL, &hw, (size_t)-1, NULL);
return 0;
}
Build with:
gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test
Output:
*** buffer overflow detected ***: terminated
Fixes: BZ #29030
Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
| |
Unused since 52a01100ad011293197637e42b5be1a479a2f4ae
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
| |
enum.IntFlag and enum.EnumMeta._missing_ support are not part of
earlier Python versions.
|
|
|
|
|
|
|
|
|
|
|
| |
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.755
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.832
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.741
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
| |
1. Use json-lib for printing results.
2. Expose all parameters (before pos, seek_char, and max_char where
not printed).
3. Add benchmarks that test multiple occurence of seek_char in the
string.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clear the upper 32 bits in RDX (memory size) for x32 to fix
FAIL: string/tst-size_t-memcmp
FAIL: string/tst-size_t-memcmp-2
FAIL: string/tst-size_t-memcpy
FAIL: wcsmbs/tst-size_t-wmemcmp
on x32 introduced by
8804157ad9 x86: Optimize memcmp SSE2 in memcmp.S
26b2478322 x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This is necessary to place the libio vtables into the RELRO segment.
New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to
verify that this is what actually happens.
The new tests fail on ia64 due to lack of (default) RELRO support
inbutils, so they are XFAILed there.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hopefully, this will lead to tests that are easier to maintain. The
current approach of parsing readelf -W output using regular expressions
is not necessarily easier than parsing the ELF data directly.
This module is still somewhat incomplete (e.g., coverage of relocation
types and versioning information is missing), but it is sufficient to
perform basic symbol analysis or program header analysis.
The EM_* mapping for architecture-specific constant classes (e.g.,
SttX86_64) is not yet implemented. The classes are defined for the
benefit of elf/tst-glibcelf.py.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
elf_dynamic_do_Rel checks RTLD_BOOTSTRAP in several #ifdef branches.
Create an outside RTLD_BOOTSTRAP branch to simplify reasoning about the
function at the cost of a few duplicate lines.
Since dl_naudit is zero in RTLD_BOOTSTRAP code, the RTLD_BOOTSTRAP
branch can avoid _dl_audit_symbind calls to decrease code size.
Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m68k is a non-PI_STATIC_AND_HIDDEN arch which uses a GOT relocation when
loading the address of a jump table. The GOT load may be reordered
before processing R_68K_RELATIVE relocations, leading to an
unrelocated/incorrect jump table, which will cause a crash.
The foolproof approach is to add an optimization barrier (e.g. calling
an non-inlinable function after relative relocations are resolved). That
is non-trivial given the current code structure, so just use the simple
approach to avoid the jump table: handle only the essential reloctions
for RTLD_BOOTSTRAP code.
This is based on Andreas Schwab's patch and fixed ld.so crash on m68k.
Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
| |
The 404656009b reversion did not setup the atomic loop to set the
cancel bits correctly. The fix is essentially what pthread_cancel
did prior 26cfbb7162ad.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8804157ad9da39631703b92315460808eac86b0c
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Fri Apr 15 12:27:59 2022 -0500
x86: Optimize memcmp SSE2 in memcmp.S
Only defined wmemcmp and missed __wmemcmp. This commit fixes that by
defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp.
Both multiarch and disable-multiarch builds succeed and full xchecks
pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After 73fc4e28b9464f0e13edc719a5372839970e7ddb,
__libc_enable_secure_decided is always 0 and a statically linked
executable may overwrite __libc_enable_secure without considering
AT_SECURE.
The __libc_enable_secure has been correctly initialized in _dl_aux_init,
so just remove __libc_enable_secure_decided and __libc_init_secure.
This allows us to remove some startup_get*id functions from
22b79ed7f413cd980a7af0cf258da5bf82b6d5e5.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
| |
Add missing support initially added by 4e8521333bea6e89fcef1020
(which missed n32 stat).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Old code was both inefficient and wasted code size. New code (-62
bytes) and comparable or better performance in the page cross case.
geometric_mean(N=20) of page cross cases New / Original: 0.960
size, align0, align1, ret, New Time/Old Time
1, 4095, 0, 0, 1.001
1, 4095, 0, 1, 0.999
1, 4095, 0, -1, 1.0
2, 4094, 0, 0, 1.0
2, 4094, 0, 1, 1.0
2, 4094, 0, -1, 1.0
3, 4093, 0, 0, 1.0
3, 4093, 0, 1, 1.0
3, 4093, 0, -1, 1.0
4, 4092, 0, 0, 0.987
4, 4092, 0, 1, 1.0
4, 4092, 0, -1, 1.0
5, 4091, 0, 0, 0.984
5, 4091, 0, 1, 1.002
5, 4091, 0, -1, 1.005
6, 4090, 0, 0, 0.993
6, 4090, 0, 1, 1.001
6, 4090, 0, -1, 1.003
7, 4089, 0, 0, 0.991
7, 4089, 0, 1, 1.0
7, 4089, 0, -1, 1.001
8, 4088, 0, 0, 0.875
8, 4088, 0, 1, 0.881
8, 4088, 0, -1, 0.888
9, 4087, 0, 0, 0.872
9, 4087, 0, 1, 0.879
9, 4087, 0, -1, 0.883
10, 4086, 0, 0, 0.878
10, 4086, 0, 1, 0.886
10, 4086, 0, -1, 0.873
11, 4085, 0, 0, 0.878
11, 4085, 0, 1, 0.881
11, 4085, 0, -1, 0.879
12, 4084, 0, 0, 0.873
12, 4084, 0, 1, 0.889
12, 4084, 0, -1, 0.875
13, 4083, 0, 0, 0.873
13, 4083, 0, 1, 0.863
13, 4083, 0, -1, 0.863
14, 4082, 0, 0, 0.838
14, 4082, 0, 1, 0.869
14, 4082, 0, -1, 0.877
15, 4081, 0, 0, 0.841
15, 4081, 0, 1, 0.869
15, 4081, 0, -1, 0.876
16, 4080, 0, 0, 0.988
16, 4080, 0, 1, 0.99
16, 4080, 0, -1, 0.989
17, 4079, 0, 0, 0.978
17, 4079, 0, 1, 0.981
17, 4079, 0, -1, 0.98
18, 4078, 0, 0, 0.981
18, 4078, 0, 1, 0.98
18, 4078, 0, -1, 0.985
19, 4077, 0, 0, 0.977
19, 4077, 0, 1, 0.979
19, 4077, 0, -1, 0.986
20, 4076, 0, 0, 0.977
20, 4076, 0, 1, 0.986
20, 4076, 0, -1, 0.984
21, 4075, 0, 0, 0.977
21, 4075, 0, 1, 0.983
21, 4075, 0, -1, 0.988
22, 4074, 0, 0, 0.983
22, 4074, 0, 1, 0.994
22, 4074, 0, -1, 0.993
23, 4073, 0, 0, 0.98
23, 4073, 0, 1, 0.992
23, 4073, 0, -1, 0.995
24, 4072, 0, 0, 0.989
24, 4072, 0, 1, 0.989
24, 4072, 0, -1, 0.991
25, 4071, 0, 0, 0.99
25, 4071, 0, 1, 0.999
25, 4071, 0, -1, 0.996
26, 4070, 0, 0, 0.993
26, 4070, 0, 1, 0.995
26, 4070, 0, -1, 0.998
27, 4069, 0, 0, 0.993
27, 4069, 0, 1, 0.999
27, 4069, 0, -1, 1.0
28, 4068, 0, 0, 0.997
28, 4068, 0, 1, 1.0
28, 4068, 0, -1, 0.999
29, 4067, 0, 0, 0.996
29, 4067, 0, 1, 0.999
29, 4067, 0, -1, 0.999
30, 4066, 0, 0, 0.991
30, 4066, 0, 1, 1.001
30, 4066, 0, -1, 0.999
31, 4065, 0, 0, 0.988
31, 4065, 0, 1, 0.998
31, 4065, 0, -1, 0.998
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code didn't actually use any sse4 instructions since `ptest` was
removed in:
commit 2f9062d7171850451e6044ef78d91ff8c017b9c0
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Nov 10 16:18:56 2021 -0600
x86: Shrink memcmp-sse4.S code size
The new memcmp-sse2 implementation is also faster.
geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905
Note there are two regressions preferring SSE2 for Size = 1 and Size =
65.
Size = 1:
size, align0, align1, ret, New Time/Old Time
1, 1, 1, 0, 1.2
1, 1, 1, 1, 1.197
1, 1, 1, -1, 1.2
This is intentional. Size == 1 is significantly less hot based on
profiles of GCC11 and Python3 than sizes [4, 8] (which is made
hotter).
Python3 Size = 1 -> 13.64%
Python3 Size = [4, 8] -> 60.92%
GCC11 Size = 1 -> 1.29%
GCC11 Size = [4, 8] -> 33.86%
size, align0, align1, ret, New Time/Old Time
4, 4, 4, 0, 0.622
4, 4, 4, 1, 0.797
4, 4, 4, -1, 0.805
5, 5, 5, 0, 0.623
5, 5, 5, 1, 0.777
5, 5, 5, -1, 0.802
6, 6, 6, 0, 0.625
6, 6, 6, 1, 0.813
6, 6, 6, -1, 0.788
7, 7, 7, 0, 0.625
7, 7, 7, 1, 0.799
7, 7, 7, -1, 0.795
8, 8, 8, 0, 0.625
8, 8, 8, 1, 0.848
8, 8, 8, -1, 0.914
9, 9, 9, 0, 0.625
Size = 65:
size, align0, align1, ret, New Time/Old Time
65, 0, 0, 0, 1.103
65, 0, 0, 1, 1.216
65, 0, 0, -1, 1.227
65, 65, 0, 0, 1.091
65, 0, 65, 1, 1.19
65, 65, 65, -1, 1.215
This is because A) the checks in range [65, 96] are now unrolled 2x
and B) because smaller values <= 16 are now given a hotter path. By
contrast the SSE4 version has a branch for Size = 80. The unrolled
version has get better performance for returns which need both
comparisons.
size, align0, align1, ret, New Time/Old Time
128, 4, 8, 0, 0.858
128, 4, 8, 1, 0.879
128, 4, 8, -1, 0.888
As well, out of microbenchmark environments that are not full
predictable the branch will have a real-cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
New code save size (-303 bytes) and has significantly better
performance.
geometric_mean(N=20) of page cross cases New / Original: 0.634
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
|
|
|
|
|
|
|
|
| |
It also handles the highly unlikely case where localtime might return
NULL, in this case only the PRI is set to hopefully instruct the relay
to get eh TIMESTAMP (as defined by the RFC).
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
|
|
|
|
|
| |
There is no easy solution as described on first comment in bug report,
and some code (like busybox) assumes facilitynames existance when
SYSLOG_NAMES is defined (so we can't just remove it as suggested in
comment #2).
So use the easier solution and guard it with __USE_MISC.
|
|
|
|
|
|
|
|
|
|
|
| |
A fixed-sized buffer is used instead of memstream for messages up to
1024 bytes to avoid the potential BUFSIZ (8K) malloc and free for
each syslog call.
Also, since the buffer size is know, memstream is replaced with a
malloced buffer for larger messages.
Checked on x86_64-linux-gnu.
|
|
|
|
|
|
|
|
| |
Use a temporary buffer for strftime instead of using internal libio
members, simplify fprintf call on the memstream and memory allocation,
use %b instead of %h, use dprintf instead of writev for LOG_PERROR.
Checked on x86_64-linux-gnu and i686-linux-gnu.
|
|
|
|
| |
And also clenaup the headers, no semantic changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test cover:
- All possible priorities and facilities through TCP and UDP.
- Same syslog tests for vsyslog.
- Some openlog/syslog/close combinations.
- openlog with LOG_CONS, LOG_PERROR, and LOG_PID.
Internally is done with a test-container where the main process mimics
the syslog server interface.
The test does not cover multithread and async-signal usage.
Checked on x86_64-linux-gnu.
|