about summary refs log tree commit diff
path: root/src/mman
Commit message (Collapse)AuthorAgeFilesLines
* remove LFS64 symbol aliases; replace with dynamic linker remappingRich Felker2022-10-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
* revert unwanted and inadvertent change that slipped into mmap.cRich Felker2019-12-201-1/+0
| | | | | | | commit ae388becb529428ac926da102f1d025b3c3968da accidentally introduced #define SYSCALL_NO_TLS 1 in mmap.c, which was probably a stale change left around from unrelated syscall timing measurements. reverse it.
* implement SO_TIMESTAMP[NS] fallback for kernels without time64 versionsRich Felker2019-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the definitions of SO_TIMESTAMP* changed on 32-bit archs in commit 38143339646a4ccce8afe298c34467767c899f51 to the new versions that provide 64-bit versions of timeval/timespec structure in control message payload. socket options, being state attached to the socket rather than function calls, are not trivial to implement as fallbacks on ENOSYS, and support for them was initially omitted on the assumption that the ioctl-based polling alternatives (SIOCGSTAMP*) could be used instead by applications if setsockopt fails. unfortunately, it turns out that SO_TIMESTAMP is sufficiently old and widely supported that a number of applications assume it's available and treat errors as fatal. this patch introduces emulation of SO_TIMESTAMP[NS] on pre-time64 kernels by falling back to setting the "_OLD" (time32) versions of the options if the time64 ones are not recognized, and performing translation of the SCM_TIMESTAMP[NS] control messages in recvmsg. since recvmsg does not know whether its caller is legacy time32 code or time64, it performs translation for any SCM_TIMESTAMP[NS]_OLD control messages it sees, leaving the original time32 timestamp as-is (it can't be rewritten in-place anyway, and memmove would be mildly expensive) and appending the converted time64 control message at the end of the buffer. legacy time32 callers will see the converted one as a spurious control message of unknown type; time64 callers running on pre-time64 kernels will see the original one as a spurious control message of unknown type. a time64 caller running on a kernel with native time64 support will only see the time64 version of the control message. emulation of SO_TIMESTAMPING is not included at this time since (1) applications which use it seem to be prepared for the possibility that it's not present or working, and (2) it can also be used in sendmsg control messages, in a manner that looks complex to emulate completely, and costly even when running on a time64-supporting kernel. corresponding changes in recvmmsg are not made at this time; they will be done separately.
* support archs with no mlock syscall, only mlock2Drew DeVault2019-03-211-0/+4
|
* remove spurious inclusion of libc.h for LFS64 ABI aliasesRich Felker2018-09-121-2/+1
| | | | | | the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
* reduce spurious inclusion of libc.hRich Felker2018-09-123-3/+0
| | | | | | | | | | | | | | | | | | | | | libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
* overhaul internally-public declarations using wrapper headersRich Felker2018-09-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
* work around incorrect EPERM from mmap syscallRich Felker2017-09-061-2/+7
| | | | | | | | | | | | | under some conditions, the mmap syscall wrongly fails with EPERM instead of ENOMEM when memory is exhausted; this is probably the result of the kernel trying to fit the allocation somewhere that crosses into the kernel range or below mmap_min_addr. in any case it's a conformance bug, so work around it. for now, only handle the case of anonymous mappings with no requested address; in other cases EPERM may be a legitimate error. this indirectly fixes the possibility of malloc failing with the wrong errno value.
* allow full-range file offsets to mmap on archs with 64-bit syscall argsRich Felker2017-04-211-1/+1
| | | | | | | normally 32-bit archs use the mmap2 syscall and are limited to an offset of 2^32 pages. however some 32-bit archs (mainly ILP32-on-64 ones like x32) have 64-bit syscall argument slots and thus can accept the full range. don't artifically limit them.
* fix mremap memory synchronization and use of variadic argumentRich Felker2015-11-021-4/+11
| | | | | | | | | | | | since mremap with the MREMAP_FIXED flag is an operation that unmaps existing mappings, it needs to use the vm lock mechanism to ensure that any in-progress synchronization operations using vm identities from before the call have finished. also, the variadic argument was erroneously being read even if the MREMAP_FIXED flag was not passed. in practice this didn't break anything, but it's UB and in theory LTO could turn it into a hard error.
* prevent allocs than PTRDIFF_MAX via mremapDaniel Micay2015-11-021-1/+8
| | | | It's quite feasible for this to happen via MREMAP_MAYMOVE.
* redesign and simplify vmlock systemRich Felker2015-04-102-15/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | this global lock allows certain unlock-type primitives to exclude mmap/munmap operations which could change the identity of virtual addresses while references to them still exist. the original design mistakenly assumed mmap/munmap would conversely need to exclude the same operations which exclude mmap/munmap, so the vmlock was implemented as a sort of 'symmetric recursive rwlock'. this turned out to be unnecessary. commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the interval during which mmap/munmap held their side of the lock, but left the inappropriate lock design and some inefficiency. the new design uses a separate function, __vm_wait, which does not hold any lock itself and only waits for lock users which were already present when it was called to release the lock. this is sufficient because of the way operations that need to be excluded are sequenced: the "unlock-type" operations using the vmlock need only block mmap/munmap operations that are precipitated by (and thus sequenced after) the atomic-unlock they perform while holding the vmlock. this allows for a spectacular lack of synchronization in the __vm_wait function itself.
* make fsync, fdatasync, and msync cancellation pointsTrutz Behn2015-01-301-1/+1
| | | | | these are mandatory cancellation points per POSIX, so their omission was a conformance bug.
* use weak symbols for the POSIX functions that will be used by C threadsJens Gustedt2014-09-061-1/+3
| | | | | | | | | | The intent of this is to avoid name space pollution of the C threads implementation. This has two sides to it. First we have to provide symbols that wouldn't pollute the name space for the C threads implementation. Second we have to clean up some internal uses of POSIX functions such that they don't implicitly drag in such symbols.
* optimize locking against vm changes for mmap/munmapRich Felker2014-08-162-8/+7
| | | | | | | | | | the whole point of this locking is to prevent munmap, or mmap with MAP_FIXED, from deallocating virtual addresses, or changing the backing a given virtual address refers to, during certain race windows involving self-synchronized unmapping or destruction of pthread synchronization objects. there is no need for exclusion in the other direction, so it suffices to take the lock momentarily and release it before making the syscall, rather than holding it across the syscall.
* add framework for mmap2 syscall unit to vary by archRich Felker2014-07-301-2/+3
|
* include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy2013-12-122-2/+0
|
* support configurable page size on mips, powerpc and microblazeSzabolcs Nagy2013-09-151-1/+1
| | | | | | | | | | | | | | | | PAGE_SIZE was hardcoded to 4096, which is historically what most systems use, but on several archs it is a kernel config parameter, user space can only know it at execution time from the aux vector. PAGE_SIZE and PAGESIZE are not defined on archs where page size is a runtime parameter, applications should use sysconf(_SC_PAGE_SIZE) to query it. Internally libc code defines PAGE_SIZE to libc.page_size, which is set to aux[AT_PAGESZ] in __init_libc and early in __dynlink as well. (Note that libc.page_size can be accessed without GOT, ie. before relocations are done) Some fpathconf settings are hardcoded to 4096, these should be actually queried from the filesystem using statfs.
* fix shm_open wrongly being cancellableRich Felker2013-07-201-1/+6
|
* disallow creation of objects larger than PTRDIFF_MAX via mmapRich Felker2013-06-271-0/+5
| | | | | | | | | | | | | | internally, other parts of the library assume sizes don't overflow ssize_t and/or ptrdiff_t, and the way this assumption is made valid is by preventing creating of such large objects. malloc already does so, but the check was missing from mmap. this is also a quality of implementation issue: even if the implementation internally could handle such objects, applications could inadvertently invoke undefined behavior by subtracting pointers within an object. it is very difficult to guard against this in applications, so a good implementation should simply ensure that it does not happen.
* clean up and fix logic for making mmap fail on invalid/unsupported offsetsRich Felker2012-12-201-3/+7
| | | | | | | | | | | | | | | | | | | | the previous logic was assuming the kernel would give EINVAL when passed an invalid address, but instead with MAP_FIXED it was giving EPERM, as it considered this an attempt to map over kernel memory. instead of trying to get the kernel to do the rigth thing, the new code just handles the error in userspace. I have also cleaned up the code to use a single mask to check for invalid low bits and unsupported high bits, so it's simpler and more clearly correct. the old code was actually wrong for sizeof(long) smaller than sizeof(off_t) but not equal to 4; now it should be correct for all possibilities. for 64-bit systems, the low-bits test is new and extraneous (the kernel should catch the error anyway when the mmap2 syscall is not used), but it's cheap anyway. if this is an issue, the OFF_MASK definition could be tweaked to omit the low bits when SYS_mmap2 is not defined.
* overhaul sem_openRich Felker2012-09-301-3/+3
| | | | | | | | | | | this function was overly complicated and not even obviously correct. avoid using openat/linkat just like in shm_open, and instead expand pathname using code shared with shm_open. remove bogus (and dangerous, with priorities) use of spinlocks. this commit also heavily streamlines the code and ensures there are no failure cases that can happen after a new semaphore has been created in the filesystem, since that case is unreportable.
* clean up, bugfixes, and general improvement for shm_open/shm_unlinkRich Felker2012-09-302-30/+28
| | | | | | | 1. don't make non-cloexec file descriptors 2. cancellation safety (cleanup handlers were missing, now unneeded) 3. share name validation/mapping code between open/unlink functions 4. avoid wasteful/slow syscalls
* mincore syscall wrapperRich Felker2012-09-091-0/+8
|
* process-shared barrier support, based on discussion with bdonlanRich Felker2011-09-272-3/+21
| | | | | | | | | | | | | this implementation is rather heavy-weight, but it's the first solution i've found that's actually correct. all waiters actually wait twice at the barrier so that they can synchronize exit, and they hold a "vm lock" that prevents changes to virtual memory mappings (and blocks pthread_barrier_destroy) until all waiters are finished inspecting the barrier. thus, it is safe for any thread to destroy and/or unmap the barrier's memory as soon as pthread_barrier_wait returns, without further synchronization.
* work around linux bug in mprotectRich Felker2011-06-291-1/+5
| | | | | | | | | | | per POSIX: The mprotect() function shall change the access protections to be that specified by prot for those whole pages containing any part of the address space of the process starting at address addr and continuing for len bytes. on the other hand, linux mprotect fails with EINVAL if the base address and/or length is not page-aligned, so we have to align them before making the syscall.
* fix missing include in posix_madvise.c (compile error)Rich Felker2011-04-201-0/+1
|
* support posix_madvise (previous a stub)Rich Felker2011-04-201-1/+3
| | | | | the check against MADV_DONTNEED to because linux MADV_DONTNEED semantics conflict dangerously with the POSIX semantics
* consistency: change all remaining syscalls to use SYS_ rather than __NR_ prefixRich Felker2011-04-061-1/+1
|
* global cleanup to use the new syscall interfaceRich Felker2011-03-2010-11/+11
|
* implement POSIX shared memoryRich Felker2011-03-032-0/+42
|
* cleaning up syscalls in preparation for x86_64 portRich Felker2011-02-131-0/+4
| | | | | | | | | - hide all the legacy xxxxxx32 name cruft in syscall.h so the actual source files can be clean and uniform across all archs. - cleanup llseek/lseek and mmap2/mmap handling for 32/64 bit systems - alternate implementation for nice if the target lacks nice syscall
* initial check-in, version 0.5.0 v0.5.0Rich Felker2011-02-1211-0/+107