| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This patch adds 32bit SSE4.2 string functions. It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested
on 32bit Core i7 and Core 2.
|
|
|
|
|
|
| |
This patch adds multiarch support when configured for i686. I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
|
|
|
|
| |
Use it to implement fma and fmaf, if possible.
|
|
|
|
|
|
| |
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
|
|
|
|
|
|
|
|
|
| |
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
|
| |
|
| |
|
|
|
|
|
| |
The file contained some code which was never used. Don't compile it
in.
|
|
|
|
|
|
|
| |
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
|
| |
|
| |
|
|
|
|
|
| |
This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt
string searching algorithm.
|
| |
|
|
|
|
|
|
| |
Some of the new multi-arch string functions for x86-64 were
not aligned to 16 byte boundarie,s possibly creating unnecessary
cache line misses and delays.
|
| |
|
| |
|
|
|
|
|
|
| |
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The SSE4.2 implementation is used in the DSO only. The patch also adds
some infrastructure to be used in similar code later one.
|
|
|
|
|
| |
Now that static executables can handle IFUNC functions don't exclude
optimization for sched_cpucount for !SHARED.
|
|
|
|
|
|
|
| |
SO far Intel and AMD use exactly the same bits meaning the same
things in CPUID index 1. Simplify the code. Should an architecture
come along which doesn't use the same semantics then it must use a
different index value than COMMON_CPUID_INDEX_1.
|
|
* configure.in: Handle --enable-multi-arch.
* elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC.
(_dl_fixup_profile): Likewise.
* elf/do-lookup.c (dl_lookup_x): Likewise.
* sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC.
* elf/elf.h (STT_GNU_IFUNC): Define.
* include/libc-symbols.h (libc_ifunc): Define.
* sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the
framework in init-arch.h to get CPUID values.
* sysdeps/x86_64/multiarch/Makefile: New file.
* sysdeps/x86_64/multiarch/init-arch.c: New file.
* sysdeps/x86_64/multiarch/init-arch.h: New file.
* sysdeps/x86_64/multiarch/sched_cpucount.c: New file.
* config.make.in (experimental-malloc): Define.
* configure.in: Handle --enable-experimental-malloc.
* malloc/Makefile: Handle experimental-malloc flag.
* malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features.
* malloc/arena.c: Likewise.
* malloc/hooks.c: Likewise.
* malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.
|