about summary refs log tree commit diff
path: root/sysdeps/i386
Commit message (Collapse)AuthorAgeFilesLines
* i386: Remove CET support bitsH.J. Lu2024-01-102-78/+2
| | | | | | | | | | | | 1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk. 2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64 features-offsets.sym. 3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only for x86-64 and also used for PLT rewrite. 4. Add x86-64 ldsodefs.h to include feature-control.h. 5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only. 6. Move x86 dl-procruntime.c to x86-64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* i386: Fail if configured with --enable-cetAdhemerval Zanella2024-01-092-7/+8
| | | | | | Since it is only supported for x86_64. Checked on i686-linux-gnu.
* i386: Remove CET supportAdhemerval Zanella2024-01-0921-241/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | CET is only support for x86_64, this patch reverts: - faaee1f07ed x86: Support shadow stack pointer in setjmp/longjmp. - be9ccd27c09 i386: Add _CET_ENDBR to indirect jump targets in add_n.S/sub_n.S - c02695d7764 x86/CET: Update vfork to prevent child return - 5d844e1b725 i386: Enable CET support in ucontext functions - 124bcde683 x86: Add _CET_ENDBR to functions in crti.S - 562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S - f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598] - 825b58f3fb i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__ - 7e119cd582 i386: Use _CET_NOTRACK in i686/memcmp.S - 177824e232 i386: Use _CET_NOTRACK in memcmp-sse4.S - 0a899af097 i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S - 7fb613361c i386: Use _CET_NOTRACK in memcpy-ssse3.S - 77a8ae0948 i386: Use _CET_NOTRACK in memset-sse2-rep.S - 00e7b76a8f i386: Use _CET_NOTRACK in memset-sse2.S - 90d15dc577 i386: Use _CET_NOTRACK in strcat-sse2.S - f1574581c7 i386: Use _CET_NOTRACK in strcpy-sse2.S - 4031d7484a i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump - target - Checked on i686-linux-gnu.
* i386: Ignore --enable-cetH.J. Lu2024-01-042-0/+9
| | | | | | | | | | | | | | Since shadow stack is only supported for x86-64, ignore --enable-cet for i386. Always setting $(enable-cet) for i386 to "no" to support ifneq ($(enable-cet),no) in x86 Makefiles. We can't use ifeq ($(enable-cet),yes) since $(enable-cet) can be "yes", "no" or "permissive". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update copyright dates with scripts/update-copyrightsPaul Eggert2024-01-01275-275/+275
|
* x86: Do not raises floating-point exception traps on fesetexceptflag (BZ 30990)Bruno Haible2023-12-191-21/+42
| | | | | | | | | | | | | | | | | | | According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. When we need to clear a flag, we need to do so in both units, due to the way fetestexcept is implemented. When we need to set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i686: Do not raise exception traps on fesetexcept (BZ 30989)Adhemerval Zanella2023-12-192-4/+71
| | | | | | | | | | | | | | | | According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. To set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* elf: Remove LD_PROFILE for static binariesAdhemerval Zanella2023-11-212-1/+3
| | | | | | | | | | | The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* x86: Use dl-symbol-redir-ifunc.h on cpu-tunablesAdhemerval Zanella2023-11-211-0/+5
| | | | | | | | | | | | | | | | The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* i686: Fix build with --disable-multiarchAdhemerval Zanella2023-08-106-2/+10
| | | | | | | | | | | | | | Since i686 provides the fortified wrappers for memcpy, mempcpy, memmove, and memset on the same string implementation, the static build tries to optimized it by not tying the fortified wrappers to string routine (to avoid pulling the fortify function if they are not required). Checked on i686-linux-gnu building with different option: default and --disable-multi-arch plus default, --disable-default-pie, --enable-fortify-source={2,3}, and --enable-fortify-source={2,3} with --disable-default-pie. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* i386: Remove memset_chk-nonshared.SAdhemerval Zanella Netto2023-07-264-30/+6
| | | | | | | | | Similar to memcpy, mempcpy, and memmove there is no need for an specific memset_chk-nonshared.S. It can be provided by memset-ia32.S itself for static library. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i386: Fix build with --enable-fortify=3Adhemerval Zanella Netto2023-07-264-65/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i386 string routines provide multiple internal definitions for memcpy, memmove, and mempcpy chk routines: $ objdump -t libc.a | grep __memcpy_chk 00000000 g F .text 0000000e __memcpy_chk 00000000 g F .text 00000013 __memcpy_chk $ objdump -t libc.a | grep __mempcpy_chk 00000000 g F .text 0000000e __mempcpy_chk 00000000 g F .text 00000013 __mempcpy_chk $ objdump -t libc.a | grep __memmove_chk 00000000 g F .text 0000000e __memmove_chk 00000000 g F .text 00000013 __memmove_chk Although is not an issue for normal static builds, with fortify=3 glibc itself might use the fortify chk functions and thus static build might fail with multiple definitions. For instance: x86_64-glibc-linux-gnu-gcc -m32 -march=i686 -o [...]math/test-signgam-uchar-static -nostdlib -nostartfiles -static -static-pie [...] x86_64-glibc-linux-gnu/bin/ld: [...]/libc.a(mempcpy-ia32.o): in function `__mempcpy_chk': [...]/glibc-git/string/../sysdeps/i386/i686/mempcpy.S:32: multiple definition of `__mempcpy_chk'; [...]/libc.a(mempcpy_chk-nonshared.o):[...]/debug/../sysdeps/i386/mempcpy_chk.S:28: first defined here collect2: error: ld returned 1 exit status make[2]: *** [../Rules:298: There is no need for mem*-nonshared.S, the __mem*_chk routines are already provided by the assembly routines. Checked on i686-linux-gnu with gcc 13 built with fortify=1,2,3 and without fortify. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Update i686 libm-test-ulps (again)Andreas K. Hüttel2023-07-191-1/+1
| | | | | | | | Based on feedback by Arsen Arsenović <arsen@gentoo.org> Linux-6.1.38-gentoo-dist-hardened x86_64 AMD Ryzen 7 3800X 8-Core Processor -march=x86-64-v2 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* Update i686 libm-test-ulpsAndreas K. Hüttel2023-07-181-0/+1
| | | | Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* configure: Use autoconf 2.71Siddhesh Poyarekar2023-07-171-14/+16
| | | | | | | | | | | | | | Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* i386: make debug wrappers compatible with static PIEAndreas Schwab2023-07-124-8/+8
| | | | Static PIE requires the use of PLT relocation.
* sysdeps: Add missing hidden definitions for i386Frédéric Bérat2023-07-102-0/+2
| | | | | Add missing libc_hidden_builtin_def for memset_chk and MEMCPY_CHK on i386.
* string: Ensure *_chk routines have their hidden builtin definition availableFrédéric Bérat2023-07-0514-1/+26
| | | | | | | If libc_hidden_builtin_{def,proto} isn't properly set for *_chk routines, there are unwanted PLT entries in libc.so. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* sysdeps/{i386, x86_64}/mempcpy_chk.S: fix linknamespace for __mempcpy_chkFrederic Berat2023-06-221-1/+1
| | | | | | | | | | | | | On i386 and x86_64, for libc.a specifically, __mempcpy_chk calls mempcpy which leads POSIX routines to call non-POSIX mempcpy indirectly. This leads the linknamespace test to fail when glibc is built with __FORTIFY_SOURCE=3. Since calling mempcpy doesn't bring any benefit for libc.a, directly call __mempcpy instead. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Fix misspellings in sysdeps/ -- BZ 25337Paul Pluzhnikov2023-05-306-6/+6
|
* math: Remove the error handling wrapper from fmod and fmodfAdhemerval Zanella Netto2023-04-032-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). The ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. For both i686 and m68k, which provive arch specific implementation, wrappers are added so no new symbol are added (which would require to change the implementations). It shows an small improvement, the results for fmod: Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992 x86_64 (Ryzen 9) | normal | 296.939 | 296.738 x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119 aarch64 (N1) | subnormal | 6.81778 | 4.33313 aarch64 (N1) | normal | 155.620 | 152.915 aarch64 (N1) | close-exponents | 8.21306 | 5.76138 armhf (N1) | subnormal | 15.1083 | 14.5746 armhf (N1) | normal | 244.833 | 241.738 armhf (N1) | close-exponents | 21.8182 | 22.457 Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* hurd: Move rtld-strncpy-c.c out of mach/hurd/Sergey Bugaev2023-04-031-0/+1
| | | | | | | | There's nothing Mach- or Hurd-specific about it; any port that ends up with rtld pulling in strncpy will need this. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230319151017.531737-15-bugaevc@gmail.com>
* htl: Generalize i386 pt-machdep.h to x86Samuel Thibault2023-02-122-28/+1
|
* string: Add libc_hidden_proto for memrchrAdhemerval Zanella2023-02-082-0/+3
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Add libc_hidden_proto for strchrnulAdhemerval Zanella2023-02-081-0/+1
| | | | | | | Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
* string: Improve generic strnlen with memchrAdhemerval Zanella2023-02-061-7/+7
| | | | | | | | It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
* Parameterize OP_T_THRES from memcopy.hRichard Henderson2023-02-062-3/+25
| | | | | | | | | | | | It moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
* Update copyright dates with scripts/update-copyrightsJoseph Myers2023-01-06278-278/+278
|
* i686: Regenerate ulpsAndreas K. Hüttel2023-01-021-7/+7
| | | | Reviewed-by: Florian Weimer <fweimer@redhat.com>
* i386: Avoid rely on linker optimization to avoid relocationAdhemerval Zanella Netto2022-11-211-4/+9
| | | | | | | | | | | | | | lld does not implement all the linker optimization to avoid the GOT relocation as done by binutils (bfd/elf32-i386.c:elf_i386_convert_load_reloc). The current 'movl main@GOT(%ebx), %eax' will then create a GOT relocation when building with lld, which make static-pie status to not being able to start the provided main function. The change uses a __wrap_main local symbol, which in turn calls main (similar as used by aarch64 and s390x). Checked on i686-linux-gnu with binutils and lld. Reviewed-by: Fangrui Song <maskray@google.com>
* elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249)Florian Weimer2022-11-031-2/+1
| | | | | | | | This makes it more likely that the compiler can compute the strlen argument in _startup_fatal at compile time, which is required to avoid a dependency on strlen this early during process startup. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* Remove all assembly optimizations for htonl and htonsAdhemerval Zanella2022-10-241-35/+0
| | | | | | | | | | The builtin bswap is already used if optimziation is enabled for GCC 4.8+, so glibc symbols will be used in a very limited scenarios. Also, gcc generated code is quite similar to all but ia64 and i386 htons. Checked on alpha, i686, and ia64.
* Remove htonl.S for i386/x86_64Cristian Rodríguez2022-10-241-34/+0
| | | | | | | Generic implementation on top of __bswap_32 always expands inline to either bswap or movbe depending on -march=*. Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
* Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sourcesFlorian Weimer2022-10-182-7/+1
| | | | | | | | | | | | | | | | | In the future, this will result in a compilation failure if the macros are unexpectedly undefined (due to header inclusion ordering or header inclusion missing altogether). Assembler sources are more difficult to convert. In many cases, they are hand-optimized for the mangling and no-mangling variants, which is why they are not converted. sysdeps/s390/s390-32/__longjmp.c and sysdeps/s390/s390-64/__longjmp.c are special: These are C sources, but most of the implementation is in assembler, so the PTR_DEMANGLE macro has to be undefined in some cases, to match the assembler style. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Introduce <pointer_guard.h>, extracted from <sysdep.h>Florian Weimer2022-10-185-0/+5
| | | | | | | | | | | | | | This allows us to define a generic no-op version of PTR_MANGLE and PTR_DEMANGLE. In the future, we can use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources, avoiding an unintended loss of hardening due to missing include files or unlucky header inclusion ordering. In i386 and x86_64, we can avoid a <tls.h> dependency in the C code by using the computed constant from <tcb-offsets.h>. <sysdep.h> no longer includes these definitions, so there is no cyclic dependency anymore when computing the <tcb-offsets.h> constants. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* elf: Remove -fno-tree-loop-distribute-patterns usage on dl-supportAdhemerval Zanella2022-10-101-0/+24
| | | | | | | | | | | | | | Besides the option being gcc specific, this approach is still fragile and not future proof since we do not know if this will be the only optimization option gcc will add that transforms loops to memset (or any libcall). This patch adds a new header, dl-symbol-redir-ifunc.h, that can b used to redirect the compiler generated libcalls to port the generic memset implementation if required. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* x86: Remove .tfloat usageAdhemerval Zanella2022-10-038-25/+45
| | | | | Some compiler does not support it (such as clang integrated assembler) neither gcc emits it.
* i386: Use cmpl instead of cmpAdhemerval Zanella2022-08-052-12/+12
| | | | Clang cannot assemble cmp in the AT&T dialect mode.
* i386: Use fldt instead of fld on e_logl.SAdhemerval Zanella2022-08-051-2/+2
| | | | Clang cannot assemble fldt in the AT&T dialect mode.
* i386: Replace movzx with movzblFangrui Song2022-08-041-18/+18
| | | | | | | | Similar to 6720d36b6623c5e48c070d86acf61198b33e144e for x86-64. Clang cannot assemble movzx in the AT&T dialect mode. Change movzx to movzbl, which follows the AT&T dialect and is used elsewhere in the file.
* i386: Remove RELA supportAdhemerval Zanella2022-08-042-197/+1
| | | | | Now that prelink is not support, there is no need to keep supporting rela for non bootstrap.
* i386: Remove -Wa,-mtune=i686Fangrui Song2022-07-121-10/+0
| | | | | | | gas -mtune= may change NOP generating patterns but -mtune=i686 has no difference from the default by inspecting .o and .os files. Note: Clang doesn't support -Wa,-mtune=i686.
* i386: Fix include paths for strspn, strcspn, and strpbrkNoah Goldstein2022-06-173-6/+6
| | | | | | | | | | | | | | commit c22eb807b0c8125101f6a274795425be2bbd0386 Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Thu Jun 16 15:07:12 2022 -0700 x86: Rename generic functions with unique postfix for clarity Changed the names of the strspn-c, strcspn-c, and strpbrk-c files in a general refactor. It didn't change the include paths for the i386 files breaking the i386 build. This commit fixes that. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* Remove remnant reference to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATAFangrui Song2022-06-151-4/+1
| | | | This fixes nios2 build after commit de38b2a343e6d64b95c50004943d6107a9e380d0.
* elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATAFangrui Song2022-06-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If an executable has copy relocations for extern protected data, that can only work if the library containing the definition is built with assumptions (a) the compiler emits GOT-generating relocations (b) the linker produces R_*_GLOB_DAT instead of R_*_RELATIVE. Otherwise the library uses its own definition directly and the executable accesses a stale copy. Note: the GOT relocations defeat the purpose of protected visibility as an optimization, but allow rtld to make the executable and library use the same copy when copy relocations are present, but it turns out this never worked perfectly. ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both a.so and b.so define protected var and the executable copy relocates var: b.so accesses its own copy even with GLOB_DAT. The behavior change is from commit 62da1e3b00b51383ffa7efc89d8addda0502e107 (x86) and then copied to nios2 (ae5eae7cfc9c4a8297ff82ec6b794faca1976ecc) and arc (0e7d930c4c11de896fe807f67fa1eb756c9c1e05). Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy relocated data like a.so. There is now a warning for copy relocation on protected symbol since commit 7374c02b683b7110b853a32496a619410364d70b. It's extremely unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA behavior, so let's remove it: this removes a check in the symbol lookup code.
* Add bounds check to __libc_ifunc_impl_listWilco Dijkstra2022-06-101-7/+2
| | | | | | | | | | | | Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC redundant and fixes several targets that will write outside the array. To avoid unnecessary large diffs, pass the maximum in the argument 'i' to IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing ones can be updated if desired. Passes buildmanyglibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* grep: egrep -> grep -E, fgrep -> grep -FSam James2022-06-052-4/+4
| | | | | | | | | | | Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on 'egrep' and 'fgrep' invocations. Convert usages within the tree to their expanded non-aliased counterparts to avoid irritating warnings during ./configure and the test suite. Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Fangrui Song <maskray@google.com>
* i686: Use generic sincosf implementation for SSE2 versionAdhemerval Zanella2022-06-014-585/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic implementation shows slight better performance (gcc 11.2.1 on a Ryzen 9 5900X): * s_sincosf-sse2.S: "sincosf": { "workload-random": { "duration": 3.89961e+09, "iterations": 9.5472e+07, "reciprocal-throughput": 40.8429, "latency": 40.8483, "max-throughput": 2.4484e+07, "min-throughput": 2.44808e+07 } } * generic s_cossinf.c: "sincosf": { "workload-random": { "duration": 3.71953e+09, "iterations": 1.48512e+08, "reciprocal-throughput": 25.0515, "latency": 25.0391, "max-throughput": 3.99177e+07, "min-throughput": 3.99375e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic sinf implementation for SSE2 versionAdhemerval Zanella2022-06-014-565/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X), the generic algorithm shows slight better performance for the 'workload-huge.wrf' input set. * s_sinf-sse2.S: "sinf": { "": { "duration": 3.72405e+09, "iterations": 2.38374e+08, "max": 63.973, "min": 11.211, "mean": 15.6227 }, "workload-random.wrf": { "duration": 3.76923e+09, "iterations": 8.4e+07, "reciprocal-throughput": 17.6355, "latency": 72.108, "max-throughput": 5.67037e+07, "min-throughput": 1.38681e+07 }, "workload-huge.wrf": { "duration": 3.76943e+09, "iterations": 6e+07, "reciprocal-throughput": 29.3493, "latency": 96.2985, "max-throughput": 3.40724e+07, "min-throughput": 1.03844e+07 } } * generic s_sinf.c: "sinf": { "": { "duration": 3.70989e+09, "iterations": 2.18025e+08, "max": 69.782, "min": 11.1, "mean": 17.0159 }, "workload-random.wrf": { "duration": 3.77213e+09, "iterations": 9.6e+07, "reciprocal-throughput": 17.5402, "latency": 61.0459, "max-throughput": 5.70119e+07, "min-throughput": 1.63811e+07 }, "workload-huge.wrf": { "duration": 3.81576e+09, "iterations": 5.6e+07, "reciprocal-throughput": 38.2111, "latency": 98.0659, "max-throughput": 2.61704e+07, "min-throughput": 1.01972e+07 } } Checked on i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* i686: Use generic cosf implementation for SSE2 versionAdhemerval Zanella2022-06-014-552/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X): * s_cosf-sse2.S: "cosf": { "workload-random": { "duration": 3.74987e+09, "iterations": 9.616e+07, "reciprocal-throughput": 15.8141, "latency": 62.1782, "max-throughput": 6.32346e+07, "min-throughput": 1.60828e+07 } } * generic s_cosf.c: "cosf": { "workload-random": { "duration": 3.87298e+09, "iterations": 1.00968e+08, "reciprocal-throughput": 18.3448, "latency": 58.3722, "max-throughput": 5.45113e+07, "min-throughput": 1.71314e+07 } } Checked on i686-linux-gnu.