about summary refs log tree commit diff
path: root/src/linux
Commit message (Collapse)AuthorAgeFilesLines
* fix public clone function to be safe and usable by applicationsRich Felker2023-06-011-6/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the clone() function has been effectively unusable since it was added, due to producing a child process with inconsistent state. in particular, the child process's thread structure still contains the tid, thread list pointers, thread count, and robust list for the parent. this will cause malfunction in interfaces that attempt to use the tid or thread list, some of which are specified to be async-signal-safe. this patch attempts to make clone() consistent in a _Fork-like sense. as in _Fork, when the parent process is multi-threaded, the child process inherits an async-signal context where it cannot call AS-unsafe functions, but its context is now intended to be safe for calling AS-safe functions. making clone fork-like would also be a future option, if it turns out that this is what makes sense to applications, but it's not done at this time because the changes would be more invasive. in the case where the CLONE_VM flag is used, clone is only vfork-like, not _Fork-like. in particular, the child will see itself as having the parent's tid, and cannot safely call any libc functions but one of the exec family or _exit. handling of flags and variadic arguments is also changed so that arguments are only consumed with flags that indicate their presence, and so that flags which produce an inconsistent state are disallowed (reported as EINVAL). in particular, all libc functions carry a contract that they are only callable with ABI requirements met, which includes having a valid thread pointer to a thread structure that's unique within the process, and whose contents are opaque and only able to be setup internally by the implementation. the only way for an application to use flags that violate these requirements without executing any libc code is to perform the syscall from application-provided asm.
* wait4: fix missing rusage on x32 due to wrong success conditionAlexey Izbyshev2023-04-111-1/+1
| | | | | | | | Resource usage data is filled by the kernel only when wait4 returns a pid, i.e. a positive value. Commit 5850546e9669f793aab61dfc7c4f2c1ff35c4b29 introduced this bug, possibly because of copy-pasting from getrusage.
* remove LFS64 symbol aliases; replace with dynamic linker remappingRich Felker2022-10-194-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
* epoll_create: fail with EINVAL if size is non-positiveKristina Martsenko2022-08-241-0/+1
| | | | | | This is a part of the interface contract defined in the Linux man page (official for a Linux-specific interface) and asserted by test cases in the Linux Test Project (LTP).
* use alt signal stack when present for implementation-internal signalsRich Felker2022-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | a request for this behavior has been open for a long time. the motivation is that application code, particularly under some language runtimes designed around very-low-footprint coroutine type constructs, may be operating with extremely small stack sizes unsuitable for receiving signals, using a separate signal stack for any signals it might handle. progress on this was blocked at one point trying to determine whether the implementation is actually entitled to clobber the alt stack, but the phrasing "available to the implementation" in the POSIX spec for sigaltstack seems to make it clear that the application cannot rely on the contents of this memory to be preserved in the absence of signal delivery (on the abstract machine, excluding implementation-internal signals) and that we can therefore use it for delivery of signals that "don't exist" on the abstract machine. no change is made for SIGTIMER since it is always blocked when used, and accepted via sigwaitinfo rather than execution of the signal handler.
* make epoll_[p]wait a cancellation pointRich Felker2021-04-031-2/+2
| | | | | | | | | this is a Linux-specific function and not covered by POSIX's requirements for which interfaces are cancellation points, but glibc makes it one and existing software relies on it being one. at some point a review for similar functions that should be made cancellation points should be done.
* fix setgroups behavior in multithreaded processRich Felker2020-10-271-1/+29
| | | | | | this function is outside the scope of the standards, but logically should behave like the set*id functions whose effects are process-global.
* remove unused weak definition of __tl_sync in membarrier.cRich Felker2020-10-141-5/+0
|
* add gettid functionRich Felker2020-08-171-0/+8
| | | | | | | | | | | | | | | this is a prerequisite for addition of other interfaces that use kernel tids, including futex and SIGEV_THREAD_ID. there is some ambiguity as to whether the semantic return type should be int or pid_t. either way, futex API imposes a contract that the values fit in int (excluding some upper reserved bits). glibc used pid_t, so in the interest of not having gratuitous mismatch (the underlying types are the same anyway), pid_t is used here as well. while conceptually this is a syscall, the copy stored in the thread structure is always valid in all contexts where it's valid to call libc functions, so it's used to avoid the syscall.
* reformat clock_adjtime with always-true condition removedRich Felker2020-06-021-48/+46
|
* always use time64 syscall first for clock_adjtimeRich Felker2020-06-021-2/+1
| | | | | clock_adjtime always returns the current clock setting in struct timex, so it's always possible that the time64 version is needed.
* fix broken time64 clock_adjtimeRich Felker2020-06-021-1/+1
| | | | | | the 64-bit time code path used the wrong (time32) syscall. fortunately this code path is not yet taken unless attempting to set a post-Y2038 time.
* clock_adjtime: generalize time64 not to assume old struct layout matchRich Felker2019-10-201-11/+46
| | | | | | | | | | | | | commit 2b4fd6f75b4fa66d28cddcf165ad48e8fda486d1 added time64 for this function, but did so with a hidden assumption that the new time64 version of struct timex will be layout-compatible with the old one. however, there is little benefit to doing it that way, and the cost is permanent special-casing of 32-bit archs with 64-bit time_t in the public interface definitions. instead, do a full translation of the structure going in and out. this commit is actually a revision to an earlier uncommited version of the code.
* wait4, getrusage: add time64/x32 variantRich Felker2019-10-191-2/+32
| | | | | | | | | | | | | | | | | | presently the kernel does not actually define time64 versions of these syscalls, and they're not really needed except to represent extreme cpu time usage. however, x32's versions of the syscalls already behave as time64 ones, meaning the functions were broken on x32 if the caller used any part of the rusage result other than ru_utime and ru_stime. commit 7e8171143124f7f510db555dc6f6327a965a3e84 made it possible to fix this by treating x32's syscalls as time64 versions. in the non-time64-syscall case, make the syscall with the rusage destination pointer adjusted so that all members but the timevals line up between the libc and kernel structures. on 64-bit archs, or present 32-bit archs with 32-bit time_t, the timevals will line up too and no further work is needed. for future 32-bit archs with 64-bit time_t, the timevals are copied into place, contingent on time_t being larger than long.
* add copy_file_range system call wrapperÁrni Dagur2019-08-231-0/+8
|
* clock_adjtime: add time64 support, decouple 32-bit time_t, fix x32Rich Felker2019-08-021-0/+110
| | | | | | | | | | | | | | | | | | | | the 64-bit/time64 version of the syscall is not API-compatible with the userspace timex structure definition; fields specified as long have type long long. so when using the time64 syscall, we have to convert the entire structure. this was always the case for x32 as well, but went unnoticed, meaning that clock_adjtime just passed junk to the kernel on x32. it should be fixed now. for the fallback case, we avoid encoding any assumptions about the new location of the time member or naming of the legacy slots by accessing them through a union of the kernel type and the new userspace type. the only assumption is that the non-time members live at the same offsets as in the (non-time64, long-based) kernel timex struct. this property saves us from having to convert the whole thing, and avoids a lot of additional work in compat shims. the new code is statically unreachable for now except on x32, where it fixes major brokenness. it is permanently unreachable on 64-bit.
* timerfd: add time64 syscall support, decouple 32-bit time_tRich Felker2019-07-291-0/+42
| | | | | the changes here are semantically and structurally identical to those made to timer_settime and timer_gettime for time64 support.
* pselect, ppoll: add time64 syscall support, decouple 32-bit time_tRich Felker2019-07-281-1/+17
| | | | | | | | | | | | | | time64 syscall is used only if it's the only one defined for the arch, or if the requested timeout length does not fit in 32 bits. on current 32-bit archs where time_t is a 32-bit type, this makes it statically unreachable. on 64-bit archs, there are only superficial changes to the code after preprocessing. both before and after these changes, these functions copied their timeout arguments to avoid letting the kernel clobber the caller's copies. now, the copying also serves to change the type from userspace timespec to a pair of longs, which makes a difference only in the 32-bit fallback case, not on 64-bit.
* implement settimeofday in terms of clock_settime, not old syscallRich Felker2019-07-271-1/+6
| | | | | | | | | this is yet another place where special handling of time syscalls can and should be avoided by implementing legacy functions in terms of their modern replacements. in theory a fallback to SYS_settimeofday could be added to clock_settime, but SYS_clock_settime has been available since Linux 2.6.0 or earlier, i.e. all the way back to the minimum supported version.
* refactor adjtime function using adjtimex function instead of syscallRich Felker2019-07-201-1/+1
| | | | | this removes the assumption that userspace struct timex matches the syscall type and sets the stage for 64-bit time_t on 32-bit archs.
* refactor adjtimex in terms of clock_adjtimeRich Felker2019-07-202-2/+4
| | | | | | this sets the stage for having the conversion logic for 64-bit time_t all in one file, and as a bonus makes clock_adjtime for CLOCK_REALTIME work even on kernels too old to have the clock_adjtime syscall.
* cap getdents length argument to INT_MAXRich Felker2019-06-281-0/+2
| | | | | | | | the linux syscall treats this argument as having type int, so passing extremely long buffer sizes would be misinterpreted by the kernel. since "short reads" are always acceptable, just cap it down. patch based on report and suggested change by Florian Weimer.
* add riscv64 architecture supportRich Felker2019-06-141-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Author: Alex Suykov <alex.suykov@gmail.com> Author: Aric Belsito <lluixhi@gmail.com> Author: Drew DeVault <sir@cmpwn.com> Author: Michael Clark <mjc@sifive.com> Author: Michael Forney <mforney@mforney.org> Author: Stefan O'Rear <sorear2@gmail.com> This port has involved the work of many people over several years. I have tried to ensure that everyone with substantial contributions has been credited above; if any omissions are found they will be noted later in an update to the authors/contributors list in the COPYRIGHT file. The version committed here comes from the riscv/riscv-musl repo's commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by me for issues found during final review: - a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc are not safe to use in separate inline asm fragments) - a_cas[_p] is fixed to be a memory barrier - the call from the _start assembly into the C part of crt1/ldso is changed to allow for the possibility that the linker does not place them nearby each other. - DTP_OFFSET is defined correctly so that local-dynamic TLS works - reloc.h LDSO_ARCH logic is simplified and made explicit. - unused, non-functional crti/n asm files are removed. - an empty .sdata section is added to crt1 so that the __global_pointer reference is resolvable. - indentation style errors in some asm files are fixed.
* in membarrier fallback, allow for possibility that sigaction failsRich Felker2019-04-091-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | this is a workaround to avoid a crashing regression on qemu-user when dynamic TLS is installed at dlopen time. the sigaction syscall should not be able to fail, but it does fail for implementation-internal signals under qemu user-level emulation if the host libc qemu is running under reserves the same signals for implementation-internal use, since qemu makes no provision to redirect/emulate them. after sigaction fails, the subsequent tkill would terminate the process abnormally as the default action. no provision to account for membarrier failing is made in the dynamic linker code that installs new TLS. at the formal level, the missing barrier in this case is incorrect, and perhaps we should fail the dlopen operation, but in practice all the archs we support (and probably all real-world archs except alpha, which isn't yet supported) should give the right behavior with no barrier at all as a consequence of consume-order properties. in the long term, this workaround should be supplemented or replaced by something better -- a different fallback approach to ensuring memory consistency, or dynamic allocation of implementation-internal signals. the latter is appealing in that it would allow cancellation to work under qemu-user too, and would even allow many levels of nested emulation.
* add membarrier syscall wrapper, refactor dynamic tls install to use itRich Felker2019-02-221-0/+76
| | | | | | | | | | the motivation for this change is twofold. first, it gets the fallback logic out of the dynamic linker, improving code readability and organization. second, it provides application code that wants to use the membarrier syscall, which depends on preregistration of intent before the process becomes multithreaded unless unbounded latency is acceptable, with a symbol that, when linked, ensures that this registration happens.
* wireup linux/name_to_handle_at and name_to_handle_at syscallsKhem Raj2018-09-122-0/+18
|
* remove spurious inclusion of libc.h for LFS64 ABI aliasesRich Felker2018-09-124-8/+4
| | | | | | the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
* reduce spurious inclusion of libc.hRich Felker2018-09-126-5/+2
| | | | | | | | | | | | | | | | | | | | | libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
* remove unused __getdents, rename and move fileRich Felker2018-09-121-0/+9
| | | | | | the __-prefixed filename does not make sense when the only purpose of this file is implementing a public function that's not used as a backend for implementing the standard dirent functions.
* overhaul internally-public declarations using wrapper headersRich Felker2018-09-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
* fix issues from public functions defined without declaration visibleRich Felker2018-09-123-0/+6
| | | | | | | | | | | | | | | | | policy is that all public functions which have a public declaration should be defined in a context where that public declaration is visible, to avoid preventable type mismatches. an audit performed using GCC's -Wmissing-declarations turned up the violations corrected here. in some cases the public header had not been included; in others, a feature test macro needed to make the declaration visible had been omitted. in the case of gethostent and getnetent, the omission seems to have been intentional, as a hack to admit a single stub definition for both functions. this kind of hack is no longer acceptable; it's UB and would not fly with LTO or advanced toolchains. the hack is undone to make exposure of the declarations possible.
* add memfd_create syscall wrapperSzabolcs Nagy2018-06-201-0/+8
| | | | memfd_create was added in linux v3.17 and glibc has api for it.
* add mlock2 linux syscall wrapperSzabolcs Nagy2018-06-201-0/+10
| | | | | | | | mlock2 syscall was added in linux v4.4 and glibc has api for it. It falls back to mlock in case of flags==0, so that case works even on older kernels. MLOCK_ONFAULT is moved under _GNU_SOURCE following glibc.
* add getrandom syscall wrapperHauke Mehrtens2018-02-221-0/+7
| | | | | This syscall is available since Linux 3.17 and was also implemented in glibc in version 2.25 using the same interfaces.
* fix undefined behavior in ptraceAlexander Monakov2017-07-041-2/+6
|
* move x32 sysinfo impl and syscall fixup code out of arch/x32/srcRich Felker2016-01-222-1/+50
| | | | | all such arch-specific translation units are being moved to appropriate arch dirs under the main src tree.
* fix incorrect void return type for syncfs functionRich Felker2015-07-091-2/+2
| | | | | | being nonstandard, the closest thing to a specification for this function is its man page, which documents it as returning int. it can fail with EBADF if the file descriptor passed is invalid.
* fix missing argument to syscall in fanotify_markClément Vasseur2014-06-141-1/+1
|
* fix breakage from recent syscall commits due to missing errno macrosRich Felker2014-05-303-0/+3
|
* fix for broken kernel side RLIM_INFINITY on mipsSzabolcs Nagy2014-05-301-1/+16
| | | | | | | | | | | | | | | | | | | | On 32 bit mips the kernel uses -1UL/2 to mark RLIM_INFINITY (and this is the definition in the userspace api), but since it is in the middle of the valid range of limits and limits are often compared with relational operators, various kernel side logic is broken if larger than -1UL/2 limits are used. So we truncate the limits to -1UL/2 in get/setrlimit and prlimit. Even if the kernel side logic consistently treated -1UL/2 as greater than any other limit value, there wouldn't be any clean workaround that allowed using large limits: * using -1UL/2 as RLIM_INFINITY in userspace would mean different infinity value for get/setrlimt and prlimit (where infinity is always -1ULL) and userspace logic could break easily (just like the kernel is broken now) and more special case code would be needed for mips. * translating -1UL/2 kernel side value to -1ULL in userspace would mean that -1UL/2 limit cannot be set (eg. -1UL/2+1 had to be passed to the kernel instead).
* support linux kernel apis (new archs) with old syscalls removedRich Felker2014-05-295-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | such archs are expected to omit definitions of the SYS_* macros for syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the preprocessor is then able to select the an appropriate implementation for affected functions. two basic strategies are used on a case-by-case basis: where the old syscalls correspond to deprecated library-level functions, the deprecated functions have been converted to wrappers for the modern function, and the modern function has fallback code (omitted at the preprocessor level on new archs) to make use of the old syscalls if the new syscall fails with ENOSYS. this also improves functionality on older kernels and eliminates the incentive to program with deprecated library-level functions for the sake of compatibility with older kernels. in other situations where the old syscalls correspond to library-level functions which are not deprecated but merely lack some new features, such as the *at functions, the old syscalls are still used on archs which support them. this may change at some point in the future if or when fallback code is added to the new functions to make them usable (possibly with reduced functionality) on old kernels.
* add namespace-protected name for sysinfo functionRich Felker2014-04-152-6/+5
| | | | | | | | | | | it will be needed to implement some things in sysconf, and the syscall can't easily be used directly because the x32 syscall uses the wrong structure layout. the l (uncreative, for "linux") prefix is used since the symbol name __sysinfo is already taken for AT_SYSINFO from the aux vector. the way the x32 override of this function works is also changed to be simpler and avoid the useless jump instruction.
* x32: fix sysinfo()rofl0r2014-03-061-0/+5
| | | | | | | | the kernel uses long longs in the struct, but the documentation says they're long. so we need to fixup the mismatch between the userspace and kernelspace structs. since the struct offers a mem_unit member, we can avoid truncation by adjusting that value.
* clone: make clone a wrapper around __cloneBobby Bingham2014-02-091-0/+19
| | | | | | | | | | | | The architecture-specific assembly versions of clone did not set errno on failure, which is inconsistent with glibc. __clone still returns the error via its return value, and clone is now a wrapper that sets errno as needed. The public clone has also been moved to src/linux, as it's not directly related to the pthreads API. __clone is called by pthread_create, which does not report errors via errno. Though not strictly necessary, it's nice to avoid clobbering errno here.
* fix const-correctness of argument to stimeRich Felker2014-01-071-1/+1
| | | | | | | it's unclear what the historical signature for this function was, but semantically, the argument should be a pointer to const, and this is what glibc uses. correct programs should not be using this function anyway, so it's unlikely to matter.
* fix signedness of pgoff argument to remap_file_pagesRich Felker2014-01-071-1/+1
| | | | | both the kernel and glibc agree that this argument is unsigned; the incorrect type ssize_t came from erroneous man pages.
* fix incorrect type for wd argument of inotify_rm_watchRich Felker2014-01-071-1/+1
| | | | | | this was wrong since the original commit adding inotify, and I don't see any explanation for it. not even the man pages have it wrong. it was most likely a copy-and-paste error.
* add some missing LFS64 aliases for fadvise/fallocate functionsRich Felker2014-01-061-0/+4
|
* fanotify.c: fix typo in header inclusionrofl0r2014-01-031-1/+1
| | | | | | the header is included only as a guard to check that the declaration and definition match, so the typo didn't cause any breakage aside from omitting this check.
* disable the brk functionRich Felker2014-01-021-1/+2
| | | | | | the reasons are the same as for sbrk. unlike sbrk, there is no safe usage because brk does not return any useful information, so it should just fail unconditionally.