about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
* Add MADV_DONTNEED_LOCKED from Linux 5.18 to bits/mman-linux.hJoseph Myers2022-06-011-0/+2
| | | | | | | | Linux 5.18 adds a constant MADV_DONTNEED_LOCKED (defined in multiple header files, but with the same value on all architectures). Add this constant to bits/mman-linux.h. Tested for x86_64.
* Add HWCAP2_MTE3 from Linux 5.18 to AArch64 bits/hwcap.hJoseph Myers2022-06-011-0/+1
| | | | | | | Linux 5.18 defines a new AArch64 HWCAP value HWCAP2_MTE3; add it to glibc's sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.
* i686: Use generic sincosf implementation for SSE2 versionAdhemerval Zanella2022-06-015-585/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation shows slight better performance (gcc 11.2.1 on a Ryzen 9 5900X): * s_sincosf-sse2.S: "sincosf": { "workload-random": { "duration": 3.89961e+09, "iterations": 9.5472e+07, "reciprocal-throughput": 40.8429, "latency": 40.8483, "max-throughput": 2.4484e+07, "min-throughput": 2.44808e+07 } } * generic s_cossinf.c: "sincosf": { "workload-random": { "duration": 3.71953e+09, "iterations": 1.48512e+08, "reciprocal-throughput": 25.0515, "latency": 25.0391, "max-throughput": 3.99177e+07, "min-throughput": 3.99375e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* benchtests: Add workload name for sincosfAdhemerval Zanella2022-06-011-0/+1
| | | | | | So it can show both reciprocal-throughput and latency. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic sinf implementation for SSE2 versionAdhemerval Zanella2022-06-015-565/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X), the generic algorithm shows slight better performance for the 'workload-huge.wrf' input set. * s_sinf-sse2.S: "sinf": { "": { "duration": 3.72405e+09, "iterations": 2.38374e+08, "max": 63.973, "min": 11.211, "mean": 15.6227 }, "workload-random.wrf": { "duration": 3.76923e+09, "iterations": 8.4e+07, "reciprocal-throughput": 17.6355, "latency": 72.108, "max-throughput": 5.67037e+07, "min-throughput": 1.38681e+07 }, "workload-huge.wrf": { "duration": 3.76943e+09, "iterations": 6e+07, "reciprocal-throughput": 29.3493, "latency": 96.2985, "max-throughput": 3.40724e+07, "min-throughput": 1.03844e+07 } } * generic s_sinf.c: "sinf": { "": { "duration": 3.70989e+09, "iterations": 2.18025e+08, "max": 69.782, "min": 11.1, "mean": 17.0159 }, "workload-random.wrf": { "duration": 3.77213e+09, "iterations": 9.6e+07, "reciprocal-throughput": 17.5402, "latency": 61.0459, "max-throughput": 5.70119e+07, "min-throughput": 1.63811e+07 }, "workload-huge.wrf": { "duration": 3.81576e+09, "iterations": 5.6e+07, "reciprocal-throughput": 38.2111, "latency": 98.0659, "max-throughput": 2.61704e+07, "min-throughput": 1.01972e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic cosf implementation for SSE2 versionAdhemerval Zanella2022-06-015-552/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X): * s_cosf-sse2.S: "cosf": { "workload-random": { "duration": 3.74987e+09, "iterations": 9.616e+07, "reciprocal-throughput": 15.8141, "latency": 62.1782, "max-throughput": 6.32346e+07, "min-throughput": 1.60828e+07 } } * generic s_cosf.c: "cosf": { "workload-random": { "duration": 3.87298e+09, "iterations": 1.00968e+08, "reciprocal-throughput": 18.3448, "latency": 58.3722, "max-throughput": 5.45113e+07, "min-throughput": 1.71314e+07 } } Checked on i686-linux-gnu.
* benchtests: Add workload name for cosfAdhemerval Zanella2022-06-011-1/+1
| | | | | | So it can show both reciprocal-throughput and latency. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* x86_64: Optimize sincos where sin/cos is optimized (bug 29193)Andreas Schwab2022-06-016-3/+55
| | | | | | | The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.
* manual: fix reference to source fileAndreas Schwab2022-05-311-1/+1
|
* Add SOL_SMC from Linux 5.18 to bits/socket.hJoseph Myers2022-05-311-0/+1
| | | | | | | Linux 5.18 adds a constant SOL_SMC to the getsockopt / setsockopt levels; add this constant to bits/socket.h. Tested for x86_64.
* elf: Remove _dl_skip_argsAdhemerval Zanella2022-05-303-7/+0
| | | | | | Now that no architecture uses it anymore. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86_64: Remove _dl_skip_args usageAdhemerval Zanella2022-05-302-22/+3
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sparc: Remove _dl_skip_args usageAdhemerval Zanella2022-05-302-79/+4
| | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on sparc64-linux-gnu and sparcv9-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sh: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-15/+1
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* s390: Remove _dl_skip_args usageAdhemerval Zanella2022-05-302-62/+0
| | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on s390x-linux-gnu and s390-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* riscv: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-11/+1
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nios2: Remove _dl_skip_args usage (BZ# 29187)Adhemerval Zanella2022-05-301-40/+10
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* mips: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-29/+2
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* microblaze: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-5/+0
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* m68k: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-7/+0
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* ia64: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-56/+14
| | | | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. The startup code is changed to read the _dl_argc and _dl_argv values, and envp is calculated from argc and argv. Checked on ia64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i686: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-11/+2
| | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* hppa: Remove _dl_skip_args usage (BZ# 29165)Adhemerval Zanella2022-05-301-22/+14
| | | | | | | | | | | | | Different than other architectures, hppa creates an unrelated stack frame where ld.so argc/argv adjustments done by ad43cac44a6860eaefc is not done on the argc/argv saved/restore by _dl_start_user. Instead load _dl_argc and _dl_argv directlty instead of adjust them using _dl_skip_args value. Checked on hppa-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* csky: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-18/+1
| | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the fixup_stack branch ununsed. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* arc: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-15/+2
| | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. So there is no need to adjust the argc or argv. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* arm: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-39/+0
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the _fixup_stack branch ununsed. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* alpha: Remove _dl_skip_args usageAdhemerval Zanella2022-05-301-41/+0
| | | | | | | | | | | Since ad43cac44a the generic code already shuffles the argv/envp/auxv on the stack to remove the ld.so own arguments and thus _dl_skip_args is always 0. It makes the fixup_stack branch ununsed. Checked with qemu-user that arguments are correctly passed on both constructors and main program. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* benchtests: Improve benchtests for strstr, memmem, and memchrNoah Goldstein2022-05-273-92/+283
| | | | | | | | | 1. Use json_ctx for output to help standardize format across all benchtests. 2. Add some additional tests to strstr and memchr expanding alignments and adding more small values. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* dlsym: Make RTLD_NEXT prefer default version definition [BZ #14932]Fangrui Song2022-05-275-1/+77
| | | | | | | | | | | | | When the first object providing foo defines both foo@v1 and foo@@v2, dlsym(RTLD_NEXT, "foo") returns foo@v1 while dlsym(RTLD_DEFAULT, "foo") returns foo@@v2. The issue is that RTLD_DEFAULT uses the DL_LOOKUP_RETURN_NEWEST flag while RTLD_NEXT doesn't. Fix the RTLD_NEXT branch to use DL_LOOKUP_RETURN_NEWEST. Note: the new behavior matches FreeBSD rtld. Future sanitizers will not need to add versioned interceptors like https://reviews.llvm.org/D96348 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOTH.J. Lu2022-05-261-2/+4
| | | | | | | | According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT. Since linkers always set their r_addends to 0, we can ignore their r_addends. Reviewed-by: Fangrui Song <maskray@google.com>
* x86_64: Implement evex512 version of strlen, strnlen, wcslen and wcsnlenSunil K Pandey2022-05-267-0/+346
| | | | | | | | | | | | | | | This patch implements following evex512 version of string functions. Perf gain for evex512 version is up to 50% as compared to evex, depending on length and alignment. Placeholder function, not used by any processor at the moment. - String length function using 512 bit vectors. - String N length using 512 bit vectors. - Wide string length using 512 bit vectors. - Wide string N length using 512 bit vectors. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Update kernel version to 5.18 in header constant testsJoseph Myers2022-05-262-2/+2
| | | | | | | | | | This patch updates the kernel version in the tests tst-mman-consts.py and tst-pidfd-consts.py to 5.18. (There are no new constants covered by these tests in 5.18, or in 5.17 in the case of tst-pidfd-consts.py that previously used version 5.16, that need any other header changes.) Tested with build-many-glibcs.py.
* String: Improve overflow test coverage for strnlenSunil K Pandey2022-05-251-0/+2
| | | | | | This patch adds more overflow test coverage for strnlen and wcsnlen. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* Update syscall-names.list for Linux 5.18Joseph Myers2022-05-251-2/+2
| | | | | | | Linux 5.18 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.18. Tested with build-many-glibcs.py.
* Fix deadlock when pthread_atfork handler calls pthread_atfork or dlcloseArjun Shankar2022-05-258-50/+499
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In multi-threaded programs, registering via pthread_atfork, de-registering implicitly via dlclose, or running pthread_atfork handlers during fork was protected by an internal lock. This meant that a pthread_atfork handler attempting to register another handler or dlclose a dynamically loaded library would lead to a deadlock. This commit fixes the deadlock in the following way: During the execution of handlers at fork time, the atfork lock is released prior to the execution of each handler and taken again upon its return. Any handler registrations or de-registrations that occurred during the execution of the handler are accounted for before proceeding with further handler execution. If a handler that hasn't been executed yet gets de-registered by another handler during fork, it will not be executed. If a handler gets registered by another handler during fork, it will not be executed during that particular fork. The possibility that handlers may now be registered or deregistered during handler execution means that identifying the next handler to be run after a given handler may register/de-register others requires some bookkeeping. The fork_handler struct has an additional field, 'id', which is assigned sequentially during registration. Thus, handlers are executed in ascending order of 'id' during 'prepare', and descending order of 'id' during parent/child handler execution after the fork. Two tests are included: * tst-atfork3: Adhemerval Zanella <adhemerval.zanella@linaro.org> This test exercises calling dlclose from prepare, parent, and child handlers. * tst-atfork4: This test exercises calling pthread_atfork and dlclose from the prepare handler. [BZ #24595, BZ #27054] Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Use Linux 5.18 in build-many-glibcs.pyJoseph Myers2022-05-241-1/+1
| | | | | | | This patch makes build-many-glibcs.py use Linux 5.18. Tested with build-many-glibcs.py (host-libraries, compilers and glibcs builds).
* stdio-common: Simplify printf_unknown interface in vfprintf-internal.cFlorian Weimer2022-05-241-18/+3
| | | | | | | The called function does not use the args array, so there is no need to produce it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* stdio-common: Move union printf_arg int <printf.h>Florian Weimer2022-05-242-23/+21
| | | | | | | The type does not depend on wide vs narrow preprocessor macros, so it does not need to be customized in stdio-common/printf-parse.h. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* stdio-common: Add printf specifier registry to <printf.h>Florian Weimer2022-05-245-14/+9
| | | | | | Add __printf_arginfo_table, __printf_function_table, __printf_va_arg_table, __register_printf_specifier to include/printf.h.
* elf/dl-reloc.c: Copyright The GNU Toolchain AuthorsFangrui Song2022-05-231-0/+1
| | | | | | | | | by following 3.5. Update copyright information on https://sourceware.org/glibc/wiki/Contribution%20checklist . The change is advised by Carlos O'Donell. Note: commit a8b11bd1f8dc68795b377138b5d94638ef75a50d missed Signed-off-by tag from Nicholas Guriev <nicholas@guriev.su>.
* benchtests: Improve bench-strnlen.cNoah Goldstein2022-05-231-25/+52
| | | | | | | 1. Output results in json format so its easier to parse 2. Increase max alignment to `getpagesize () - 1` to make it possible to test page cross cases. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* math: Add math-use-builtins-fabs (BZ#29027)Adhemerval Zanella2022-05-2312-208/+29
| | | | | | | | | | | | | | | | | | Both float, double, and _Float128 are assumed to be supported (float and double already only uses builtins). Only long double is parametrized due GCC bug 29253 which prevents its usage on powerpc. It allows to remove i686, ia64, x86_64, powerpc, and sparc arch specific implementation. On ia64 it also fixes the sNAN handling: math/test-float64x-fabs math/test-ldouble-fabs Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
* linux: Add CLONE_NEWTIME from Linux 5.6 to bits/sched.hAdhemerval Zanella2022-05-231-0/+4
| | | | It was added in commit 769071ac9f20b6a447410c7eaa55d1a5233ef40c.
* Revert "[ARM][BZ #17711] Fix extern protected data handling"Fangrui Song2022-05-232-28/+3
| | | | | | | | This reverts commit 3bcea719ddd6ce399d7bccb492c40af77d216e42. Similar to commit e555954e026df1b85b8ef6c101d05f97b1520d7e for aarch64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Revert "[AArch64][BZ #17711] Fix extern protected data handling"Fangrui Song2022-05-232-28/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0910702c4d2cf9e8302b35c9519548726e1ac489. Say both a.so and b.so define protected data symbol `var` and the executable copy relocates var. ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics: a.so accesses the copy in the executable while b.so accesses its own. This behavior requires that (a) the compiler emits GOT-generating relocations (b) the linker produces GLOB_DAT instead of RELATIVE. Without the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA code, b.so's GLOB_DAT will bind to the executable (normal behavior). For aarch64 it makes sense to restore the original behavior and don't pay the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA cost. The behavior is very unlikely used by anyone. * Clang code generator treats STV_PROTECTED the same way as STV_HIDDEN: no GOT-generating relocation in the first place. * gold and lld reject copy relocation on a STV_PROTECTED symbol. * Nowadays -fpie/-fpic modes are popular. GCC/Clang's codegen uses GOT-generating relocation when accessing an default visibility external symbol which avoids copy relocation. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* elf: Rewrite long RESOLVE_MAP macro to an always_inline static functionNicholas Guriev2022-05-231-22/+34
| | | | | | | An __always_inline static function is better to find where exactly a crash happens, so one can step into the function with GDB. Reviewed-by: Fangrui Song <maskray@google.com>
* dlfcn: Move RTLD_DEFAULT/RTLD_NEXT outside __USE_GNUFangrui Song2022-05-231-12/+10
| | | | | | | | | | POSIX reserves the RTLD_ namespace, and this is already reflected in our conform tests. Note: RTLD_DEFAULT and RTLD_NEXT appear in IEEE Std 1003.1-2004. Many systems (e.g. FreeBSD, musl) just define the macros unconditionally. Reviewed-by: Florian Weimer <fweimer@redhat.com> Tested-by: Florian Weimer <fweimer@redhat.com>
* elf: Optimize _dl_new_hash in dl-new-hash.hNoah Goldstein2022-05-235-13/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. The unrolling allows for pipelined multiplies. As well, as an optional sysdep, reorder the operations and prevent reassosiation for better scheduling and higher ILP. This commit only adds the barrier for x86, although it should be either no change or a win for any architecture. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* nss: Optimize nss_hash in nss_hash.cNoah Goldstein2022-05-231-37/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The prior unrolling didn't really do much as it left the dependency chain between iterations. Unrolled the loop for 4 so 4x multiplies could be pipelined in out-of-order machines. Results for __nss_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=25 runs Geometric of all benchmark New / Old: 0.845 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 4.019, 3.729, 1.078 fixed, 1, 4.95, 5.707, 0.867 fixed, 2, 5.152, 5.657, 0.911 fixed, 3, 4.641, 5.721, 0.811 fixed, 4, 5.551, 5.81, 0.955 fixed, 5, 6.525, 6.552, 0.996 fixed, 6, 6.711, 6.561, 1.023 fixed, 7, 6.715, 6.767, 0.992 fixed, 8, 7.874, 7.915, 0.995 fixed, 9, 8.888, 9.767, 0.91 fixed, 10, 8.959, 9.762, 0.918 fixed, 11, 9.188, 9.987, 0.92 fixed, 12, 9.708, 10.618, 0.914 fixed, 13, 10.393, 11.14, 0.933 fixed, 14, 10.628, 12.097, 0.879 fixed, 15, 10.982, 12.965, 0.847 fixed, 16, 11.851, 14.429, 0.821 fixed, 32, 24.334, 34.414, 0.707 fixed, 64, 55.618, 86.688, 0.642 fixed, 128, 118.261, 224.36, 0.527 fixed, 256, 256.183, 538.629, 0.476 random, 2, 11.194, 11.556, 0.969 random, 4, 17.516, 17.205, 1.018 random, 8, 23.501, 20.985, 1.12 random, 16, 28.131, 29.212, 0.963 random, 32, 35.436, 38.662, 0.917 random, 64, 45.74, 58.868, 0.777 random, 128, 75.394, 121.963, 0.618 random, 256, 139.524, 260.726, 0.535 Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* benchtests: Add benchtests for dl_elf_hash, dl_new_hash and nss_hashNoah Goldstein2022-05-237-8/+335
| | | | | | Benchtests are for throughput and include random / fixed size benchmarks. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>