about summary refs log tree commit diff
path: root/sysdeps/x86_64
Commit message (Collapse)AuthorAgeFilesLines
* Unroll the loop x86-64 SSE4.2 strlen.H.J. Lu2010-01-131-15/+45
|
* Optimize 32bit memset/memcpy with SSE2/SSSE3.H.J. Lu2010-01-124-3/+42
|
* Define bit_SSE2 and index_SSE2.H.J. Lu2009-12-131-0/+2
|
* Define bit_XXX and index_XXX.H.J. Lu2009-12-139-17/+31
| | | | | | This patch defines bit_XXX and index_XXX and use them to check processor feature in assembly code. It can prevent typos in processor feature check.
* Fix whitespaces.Ulrich Drepper2009-10-222-11/+11
|
* Implement SSE4.2 optimized strchr and strrchr.H.J. Lu2009-10-224-1/+506
|
* Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch ↵Roland McGrath2009-10-062-5/+4
| | | | definitions.
* Clean up x86 multiarch HAS_FOO macros.Roland McGrath2009-10-062-23/+10
|
* configure tweaks, support $libc_add_on_config_subdirsRoland McGrath2009-09-151-32/+0
|
* Fix strstr/strcasestr/fma/fmaf on x86_64.Jakub Jelinek2009-09-024-6/+8
|
* Fix x86_64 bits/mathinline.h for -m32 compilation.Jakub Jelinek2009-09-011-0/+12
|
* Fix parse error in bits/mathinline.h with --std=c99Andreas Schwab2009-08-311-2/+2
|
* Remove ENABLE_SSSE3_ON_ATOM.H.J. Lu2009-08-281-9/+1
| | | | | It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch removes ENABLE_SSSE3_ON_ATOM.
* Optimize out duplicated scalbln code for x86-64.Ulrich Drepper2009-08-252-0/+11
|
* Optimized signbit{,f} for x86-64.Ulrich Drepper2009-08-252-0/+54
|
* Handle AVX saving on x86-64 in interrupted smbol lookups.Ulrich Drepper2009-08-251-1/+0
| | | | | | | | | If a signal arrived during a symbol lookup and the signal handler also required a symbol lookup, the end of the lookup in the signal handler reset the flag whether restoring AVX/SSE registers is needed. Resetting means in this case that the tail part of the outer lookup code will try to restore the registers and this can fail miserably. We now restore to the previous value which makes nesting calls possible.
* Add ceil implementation for 64-bit machines.Ulrich Drepper2009-08-241-0/+17
| | | | | | | On 64-bit machines we should not split doubles into two 32 bit integer and handle the words separately. We have wide registers. This patch implements a 64-bit ceil version. Ideally all other functions will be converted over time.
* Optimize float construction/extraction on x86-64.Ulrich Drepper2009-08-241-0/+20
|
* Optimize x86-64 signbit{,f} a bit.Ulrich Drepper2009-08-241-5/+7
|
* Support mixed SSE/AVX audit and check AVX only once.H.J. Lu2009-08-082-237/+276
| | | | | | | | | | This patch fixes mixed SSE/AVX audit and checks AVX only once in _dl_runtime_profile. When an AVX or SSE register value in pltenter is modified, we have to make sure that the SSE part value is the same in both lr_xmm and lr_vector fields so that pltexit will get the correct value from either lr_xmm or lr_vector fields. AVX-enabled pltenter should update both lr_xmm and lr_vector fields to support stacked AVX/SSE pltenter functions.
* Move SSE4.2 functions together.Ulrich Drepper2009-08-082-0/+2
|
* Add SSSE3-optimized implementation of str{,n}cmp for x86-64.Ulrich Drepper2009-08-075-47/+185
|
* Avoid warning through fake initialization.Ulrich Drepper2009-08-071-0/+2
|
* Fix whitespaces in last checkin.Ulrich Drepper2009-08-071-1/+1
|
* Properly count number of logical processors on Intel CPUs.H.J. Lu2009-08-071-4/+38
| | | | | | | | | | | | | | | | | | | | The meaning of the 25-14 bits in EAX returned from cpuid with EAX = 4 has been changed from "the maximum number of threads sharing the cache" to "the maximum number of addressable IDs for logical processors sharing the cache" if cpuid takes EAX = 11. We need to use results from both EAX = 4 and EAX = 11 to get the number of threads sharing the cache. The 25-14 bits in EAX on Core i7 is 15 although the number of logical processors is 8. Here is a white paper on this: http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ This patch correctly counts number of logical processors on Intel CPUs with EAX = 11 support on cpuid. Tested on Dinnington, Core i7 and Nehalem EX/EP. It also fixed Pentium Ds workaround since EBX may not have the right value returned from cpuid with EAX = 1.
* Add x86 32-bit SSE4.2 string functions.H.J. Lu2009-08-042-4/+4
| | | | | | This patch adds 32bit SSE4.2 string functions. It uses -16L instead of 0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested on 32bit Core i7 and Core 2.
* Support multiarch for i686.H.J. Lu2009-07-314-49/+56
| | | | | | This patch adds multiarch support when configured for i686. I modified some x86-64 functions to support 32bit. I will contribute 32bit SSE string and memory functions later.
* ____longjmp_chk is now OS-specific.Ulrich Drepper2009-07-301-145/+1
| | | | | | | We use sigaltstack internally which on some systems is a syscall and should be used as such. Move the x86-64 version to the Linux specific directory and create in its place a file which always causes compile errors.
* Change code a bit to correct CFI.Ulrich Drepper2009-07-301-1/+3
|
* Optimize ____longjmp_chk for x86-64 a bit.Ulrich Drepper2009-07-301-5/+3
|
* Fix x86-64 ____longjmp_chk to handle signal stacks.Ulrich Drepper2009-07-302-7/+106
| | | | | | The simple test previously used might trigger if the longjmp jumps from the signal stack to the normal stack. We now explicitly test for this case.
* Add support for x86-64 fma instruction.Ulrich Drepper2009-07-293-0/+90
| | | | Use it to implement fma and fmaf, if possible.
* Prepare use if IFUNC functions outside libc.so.Ulrich Drepper2009-07-292-2/+30
| | | | | | We use a callback function into libc.so to get access to the data structure with the information and have special versions of the test macros which automatically use this function.
* Improve CFI in x86-64 ld.so trampoline code.Ulrich Drepper2009-07-291-1/+2
|
* Properly restore AVX registers on x86-64.H.J. Lu2009-07-291-10/+10
| | | | | tst-audit4 and tst-audit5 fail under AVX emulator due to je instead of jne. This patch fixes them.
* Preserve SSE registers in runtime relocations on x86-64.Ulrich Drepper2009-07-292-3/+86
| | | | | | | | | | SSE registers are used for passing parameters and must be preserved in runtime relocations. This is inside ld.so enforced through the tests in tst-xmmymm.sh. But the malloc routines used after startup come from libc.so and can be arbitrarily complex. It's overkill to save the SSE registers all the time because of that. These calls are rare. Instead we save them on demand. The new infrastructure put in place in this patch makes this possible and efficient.
* Refine testing for xmm/ymm register use in x86-64 ld.so.Ulrich Drepper2009-07-275-12/+72
| | | | | | | | | The test now takes the callgraph into account. Only code called during runtime relocation is affected by the limitation. We now determine the affected object files as closely as possible from the outside. This allowed to remove some the specializations for some of the string functions as they are only used in other code paths.
* No need for special strcmp for rtld.Ulrich Drepper2009-07-271-28/+0
|
* Make sure no code in ld.so uses xmm/ymm registers on x86-64.Ulrich Drepper2009-07-2610-0/+484
| | | | | | | | | | This patch introduces a test to make sure no function modifies the xmm/ymm registers. With the exception of the auditing functions. The test is probably too pessimistic. All code linked into ld.so is checked. Perhaps at some point the callgraph starting from _dl_fixup and _dl_profile_fixup is checked and we can start using faster SSE-using functions in parts of ld.so.
* Add SSE2 support to str{,n}cmp for x86-64.H.J. Lu2009-07-265-267/+2055
|
* Some some optimizations for x86-64 strcmp.H.J. Lu2009-07-251-9/+4
|
* Optimize x86-64 SSE4.2 strcmp.Ulrich Drepper2009-07-251-0/+5
| | | | | The file contained some code which was never used. Don't compile it in.
* Avoid cpuid instructions in cache info discovery.Ulrich Drepper2009-07-231-19/+31
| | | | When multiarch is enabled we have this information stored. Use it.
* Add more cache descriptors for L3 caches on x86 and x86-64.Ulrich Drepper2009-07-231-0/+3
| | | | | The most recent AP 485 describes a few more cache descriptors for L3 caches with 24-way associativity.
* Perform test for Arom x86-64 in central place and handle it.Ulrich Drepper2009-07-232-11/+10
| | | | | | | There will be more than one function which, in multiarch mode, wants to use SSSE3. We should not test in each of them for Atoms with slow SSSE3. Instead, disable the SSSE3 bit in the startup code for such machines.
* Minor cleanups in x86-64 strstr.Ulrich Drepper2009-07-211-78/+55
|
* Better check for optimization in new x86-64 strstr/strcasestr.Ulrich Drepper2009-07-201-11/+15
|
* SSE4.2 strstr/strcasestr for x86-64.H.J. Lu2009-07-205-1/+519
| | | | | This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt string searching algorithm.
* Optimize restoring of ymm registers on x86-64.Ulrich Drepper2009-07-161-43/+34
| | | | The patch mainly reduces the code size but also avoids some jumps.
* Fix up whitespaces in new memcmp for x86-64.Ulrich Drepper2009-07-161-42/+42
|