about summary refs log tree commit diff
path: root/sysdeps/aarch64/strlen.S
Commit message (Collapse)AuthorAgeFilesLines
* [aarch64] Add an ASIMD variant of strlen for falkorSiddhesh Poyarekar2018-08-151-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This variant of strlen uses vector loads and operations to reduce the size of the code and also eliminate the non-ascii fallback. This works very well for falkor because of its two vector units and efficient vector ops. In the best case it reduces latency of cases in bench-strlen by 48%, with gains throughout the benchmark. strlen-walk also sees uniform gains in the 5%-15% range. Overall the routine appears to work better than the stock one for falkor regardless of the benchmark, length of string or cache state. The same cannot be said of a53 and a72 though. a53 performance was greatly reduced and for a72 it was a bit of a mixed bag, slightly on the negative side but I reckon it might be fast in some situations. * sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN. [!STRLEN](STRLEN): Set to __strlen. * sysdeps/aarch64/multiarch/strlen.c: New file. * sysdeps/aarch64/multiarch/strlen_generic.S: Likewise. * sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add strlen. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add strlen_generic and strlen_asimd. Reviewed-By: szabolcs.nagy@arm.com CC: pinskia@gmail.com
* [aarch64] Fix value of MIN_PAGE_SIZE for testingSiddhesh Poyarekar2018-08-081-1/+1
| | | | | | | | | | MIN_PAGE_SIZE is normally set to 4096 but for testing it can be set to 16 so that it exercises the page crossing code for every misaligned access. The value was set to 15, which is obviously wrong, so fixed as obvious and tested. * sysdeps/aarch64/strlen.S [TEST_PAGE_CROSS](MIN_PAGE_SIZE): Fix value.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2018-01-011-1/+1
| | | | | | | * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2017-01-011-1/+1
|
* Partial ILP32 support for aarch64.Steve Ellcey2016-11-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * sysdeps/aarch64/crti.S: Add include of sysdep.h. (call_weak_fn): Use PTR_REG to get correct reg name in ILP32. * sysdeps/aarch64/dl-irel.h: Add include of sysdep.h. (elf_irela): Use AARCH64_R macro to get correct relocation in ILP32. * sysdeps/aarch64/dl-machine.h: Add include of sysdep.h. (elf_machine_load_address, RTLD_START, RTLD_START_1, RTLD_START, elf_machine_type_class, ELF_MACHINE_JMP_SLOT, elf_machine_rela, elf_machine_lazy_rel): Add ifdef's for ILP32 support. * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return, _dl_tlsdesc_return_lazy, _dl_tlsdesc_dynamic, _dl_tlsdesc_resolve_hold): Extend pointers in ILP32, use PTR_REG to get correct reg name for ILP32. * sysdeps/aarch64/dl-trampoline.S (ip01): New Macro. (RELA_SIZE): New Macro. (_dl_runtime_resolve, _dl_runtime_profile): Use new macros and PTR_REG to support ILP32. * sysdeps/aarch64/jmpbuf-unwind.h (_JMPBUF_CFA_UNWINDS_ADJ): Add cast for ILP32 mode. * sysdeps/aarch64/memcmp.S (memcmp): Extend arg pointers for ILP32 mode. * sysdeps/aarch64/memcpy.S (memmove, memcpy): Ditto. * sysdeps/aarch64/memset.S (__memset): Ditto. * sysdeps/aarch64/strchr.S (strchr): Ditto. * sysdeps/aarch64/strchrnul.S (__strchrnul): Ditto. * sysdeps/aarch64/strcmp.S (strcmp): Ditto. * sysdeps/aarch64/strcpy.S (strcpy): Ditto. * sysdeps/aarch64/strlen.S (__strlen): Ditto. * sysdeps/aarch64/strncmp.S (strncmp): Ditto. * sysdeps/aarch64/strnlen.S (strnlen): Ditto. * sysdeps/aarch64/strrchr.S (strrchr): Ditto. * sysdeps/unix/sysv/linux/aarch64/clone.S: Ditto. * sysdeps/unix/sysv/linux/aarch64/setcontext.S (__setcontext): Ditto. * sysdeps/unix/sysv/linux/aarch64/swapcontext.S (__swapcontext): Ditto. * sysdeps/aarch64/__longjmp.S (__longjmp): Extend pointers in ILP32, change PTR_MANGLE call to use register numbers instead of names. * sysdeps/unix/sysv/linux/aarch64/getcontext.S (__getcontext): Ditto. * sysdeps/aarch64/setjmp.S (__sigsetjmp): Extend arg pointers for ILP32 mode, change PTR_MANGLE calls to use register numbers. * sysdeps/aarch64/start.S (_start): Ditto. * sysdeps/aarch64/nptl/bits/pthreadtypes.h (__PTHREAD_RWLOCK_INT_FLAGS_SHARED): New define. (__SIZEOF_PTHREAD_ATTR_T, __SIZEOF_PTHREAD_MUTEX_T, __SIZEOF_PTHREAD_MUTEXATTR_T, __SIZEOF_PTHREAD_COND_T, __SIZEOF_PTHREAD_COND_COMPAT_T, __SIZEOF_PTHREAD_CONDATTR_T, __SIZEOF_PTHREAD_RWLOCK_T, __SIZEOF_PTHREAD_RWLOCKATTR_T, __SIZEOF_PTHREAD_BARRIER_T, __SIZEOF_PTHREAD_BARRIERATTR_T): Make defined values dependent on __ILP32__. * sysdeps/aarch64/nptl/bits/semaphore.h (__SIZEOF_SEM_T): Change define. (sem_t): Change __align type. * sysdeps/aarch64/sysdep.h (AARCH64_R, PTR_REG, PTR_LOG_SIZE, DELOUSE, PTR_SIZE): New Macros. (LDST_PCREL, LDST_GLOBAL) Update to use PTR_REG. * sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h (O_LARGEFILE): Set when in ILP32 mode. (F_GETLK64, F_SETLK64, F_SETLKW64): Only set in LP64 mode. * sysdeps/unix/sysv/linux/aarch64/dl-cache.h (DL_CACHE_DEFAULT_ID): Set elf flags for ILP32. (add_system_dir): Set ILP32 library directories. * sysdeps/unix/sysv/linux/aarch64/init-first.c (_libc_vdso_platform_setup): Set minimum kernel version for ILP32. * sysdeps/unix/sysv/linux/aarch64/ldconfig.h (SYSDEP_KNOWN_INTERPRETER_NAMES): Add ILP32 names. * sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (GET_PC, SET_PC): New Macros. * sysdeps/unix/sysv/linux/aarch64/sysdep.h: Handle ILP32 pointers.
* Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as itWilco Dijkstra2016-06-201-2/+3
| | | | | | | | is the fastest way to search for '\0'. Otherwise use memchr with an infinite size. This is 3x faster on benchtests for large sizes. Passes GLIBC tests. * sysdeps/aarch64/rawmemchr.S (__rawmemchr): New file. * sysdeps/aarch64/strlen.S (__strlen): Change to __strlen to avoid PLT.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2016-01-041-1/+1
|
* Optimize the strlen implementation by using a page cross check and a fast checkWilco Dijkstra2015-07-131-65/+165
| | | | | | for nul bytes which reverts to separate loop when a non-ASCII char is encountered. Speedup on test-strlen is ~10%, long ASCII strings are processed ~60% faster, and on random tests it is ~80% better.
* Update copyright dates with scripts/update-copyrights.Joseph Myers2015-01-021-1/+1
|
* Relocate AArch64 from ports to libc.Marcus Shawcroft2014-02-111-0/+117
This patch moves the AArch64 port to the main sysdeps hierarchy. The move is essentially: git mv ports/sysdeps/aarch64 sysdeps/aarch64 git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64 The README is updated and I've updated ChangeLog.aarch64 along the lines of the ARM move. The AArch64 build has been tested to confirm that there were no changes in objdump -dr output or the shared objects.