about summary refs log tree commit diff
Commit message (Collapse)AuthorAgeFilesLines
...
* riscv64: add vforkPedro Falcato2023-02-091-0/+12
| | | | Implement vfork() using clone(CLONE_VM | CLONE_VFORK | ...).
* fix wrong sigaction syscall ABI on mips*, or1k, microblaze, riscv64Rich Felker2023-02-0914-50/+12
| | | | | | | | | | | | | | | | | | | | | we wrongly defined a dummy SA_RESTORER flag on these archs, despite the kernel interface not actually having such a feature. on archs which lack SA_RESTORER, the kernel sigaction structure also lacks the restorer function pointer member, which means the signal mask appears at a different offset. the kernel was thereby interpreting the bits of the code address as part of the signal set to be masked while handling the signal. this patch removes the erroneous SA_RESTORER definitions from archs which do not have it, makes access to the member conditional on whether SA_RESTORER is defined for the arch, and removes the now-unused asm for the affected archs. because there are reportedly versions of qemu-user which also use the wrong ABI here, the old ksigaction struct size is preserved with an unused member at the end. this is harmless and mitigates the risk of such a bug turning into a buffer overflow onto the sigaction function's stack.
* fix integer overflow in WIFSTOPPED macroRich Felker2023-02-082-2/+2
| | | | | | | | | the result of the 0xffff mask with the exit status could have bit 15 set, in which case multiplying by 0x10001 overflows 32-bit signed int. making the multiply unsigned avoids the overflow. it also changes the sign extension behavior of the subsequent >> operation, but the affected bits are all unwanted anyway and all discarded by the cast to short.
* fix debugger tracking of shared libraries on mips with PIE main programRich Felker2023-01-185-0/+11
| | | | | | | | mips has its own mechanisms for DT_DEBUG because it makes _DYNAMIC read-only, and the original mechanism, DT_MIPS_RLD_MAP, was PIE-incompatible. DT_MIPS_RLD_MAP_REL was added to remedy this, but we never implemented support for it. add it now using the same idioms for mips-specific ldso logic.
* expose memmem under baseline POSIX feature profileRich Felker2023-01-061-1/+1
| | | | | | | memmem has been adopted for the next issue of POSIX (outcome of tracker item 1061). since mem* is in the reserved namespace for string.h it's already fully conforming to expose it by default, so just do so.
* use libc-internal malloc for pthread_atforkRich Felker2022-12-171-0/+5
| | | | | | | | | while no lock is held here making it a lock-order issue, replacement malloc is likely to want to use pthread_atfork, possibly making the call to malloc infinitely recursive. even if not, there is no reason to prefer an application-provided malloc here.
* prevent invalid reads of nl_arg in printf_coreMarkus Wichmann2022-12-141-6/+8
| | | | | | | | | | | | | | printf_core() runs twice, and during its first run, nl_arg is uninitialized and must not be read. It gets initialized at the end of the first run. Conversely, nl_type does not need to be set during the second run, as its useful life has ended at that point, since the only time it is read is during that exact same initialization. Therefore we can simply alternate the assignments. p and w do still need to get values assigned to them, since at least one line in the same if-statement depends on that, but they can be dummy values. arg does not need to be assigned, since in the first run, we encounter a continue statement before using the argument.
* elf.h: add ELFCOMPRESS_ZSTDFangrui Song2022-12-141-0/+1
|
* semaphores: fix missed wakes from ABA bug in waiter count logicRich Felker2022-12-134-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | because the has-waiters state in the semaphore value futex word is only representable when the value is zero (the special value -1 represents "0 with potential new waiters"), it's lost if intervening operations make the semaphore value positive again. this creates an ABA issue in sem_post, whereby the post uses a stale waiters count rather than re-evaluating it, skipping the futex wake if the stale count was zero. the fix here is based on a proposal by Alexey Izbyshev, with minor changes to eliminate costly new spurious wake syscalls. the basic idea is to replace the special value -1 with a sticky waiters bit (repurposing the sign bit) preserved under both wait and post. any post that takes place with the waiters bit set will perform a futex wake. to be useful, the waiters bit needs to be removable, and to remove it safely, we perform a broadcast wake instead of a normal single-task wake whenever removing the bit. this lets any un-accounted-for waiters wake and re-add the waiters bit if they still need it. there are multiple possible choices for when to perform this broadcast, but the optimal choice seems to be doing it whenever the observed waiters count is less than two (semantically, this means exactly one, but we might see a stale count of zero). in this case, the expected number of threads to be woken is one, with exactly the same cost as a non-broadcast wake.
* ldso: fix invalid early references to extern-linkage libc.page_sizeRich Felker2022-11-301-1/+8
| | | | | | | | | | | | | | | | | | | | | | | when PAGE_SIZE is not constant, internal/libc.h defines it to expand to libc.page_size. however, kernel_mapped_dso, reachable from stage 2 of the dynamic linker bootstrap (__dls2), needs PAGE_SIZE to interpret the relro range. at this point the libc object is both uninitialized and invalid to access according to our model for bootstrapping, which does not assume any external-linkage objects are accessible until stages 2b/3. in practice it likely worked because hidden visibility tends to behave like internal linkage, but this is not a property that the dynamic linker was designed to rely upon. this bug likely manifested as relro malfunction on archs with variable page size, due to incorrect mask when aligning the relro bounds to page boundaries. while there are certainly more direct ways to fix the known problem point here, a maximally future-proof way is to just bypass the libc.h PAGE_SIZE definition in the dynamic linker and instead have dynlink.c define its own internal-linkage object for variable page size. then, if anything else in stage 2 ever ends up referencing PAGE_SIZE, it will just automatically work right.
* pthread_atfork: fix return value on malloc failureAlexey Izbyshev2022-11-121-1/+2
| | | | | POSIX requires pthread_atfork to report errors via its return value, not via errno. The only specified error is ENOMEM.
* fix double-processing of DT_RELR relocations in ldso relocating itselfRich Felker2022-11-101-0/+1
| | | | | | this is analogous to skip_relative logic in do_relocs -- because relative relocations for the dynamic linker itself were already performed at entry (stage 1), they must not be applied again.
* fix strverscmp comparison of digit sequence with non-digitsRich Felker2022-11-071-3/+3
| | | | | | | | | | | | the rule that longest digit sequence not beginning with a zero is greater only applies when both sequences being compared are non-degenerate. this is spelled out explicitly in the man page, which may be deemed authoritative for this nonstandard function: "If one or both of these is empty, then return what strcmp(3) would have returned..." we were wrongly treating any sequence of digits not beginning with a zero as greater than a non-digit in the other string.
* fix async thread cancellation stack alignmentRich Felker2022-11-051-1/+6
| | | | | | | | | | | | | | | if async cancellation is enabled and acted upon, the stack pointer is not necessarily pointing to a __syscall_cp_asm stack frame. the contents of the stack being wrong don't really matter, but if the stack pointer is not suitably aligned, the procedure call ABI is violated when calling back into C code via __cancel, and pthread_exit, cancellation cleanup handlers, TSD destructors, etc. may malfunction or crash. for the async cancel case, just call __cancel directly like we did prior to commit 102f6a01e249ce4495f1119ae6d963a2a4a53ce5. restore the signal mask prior to doing this since the cancellation handler runs with all signals blocked.
* fix return value of gethostby{name[2],addr} with no result but no errorRich Felker2022-10-202-2/+2
| | | | | | | | | | | commit f081d5336a80b68d3e1bed789cc373c5c3d6699b fixed gethostbyname[2]_r to treat negative results as a non-error, leaving gethostbyname[2] wrongly returning a pointer to the unfilled result buffer rather than a null pointer. since, as documented with commit fe82bb9b921be34370e6b71a1c6f062c20999ae0, the caller of gethostby{name[2],addr}_r can always rely on the result pointer being set, use that consistently rather than trying to duplicate logic about whether we have a result or not in gethostby{name[2],addr}.
* clean up dns_parse_callbackRich Felker2022-10-191-13/+13
| | | | | | | | | | | | | the only functional change here should be that MAXADDRS is only checked for RRs that provide address results, so that a CNAME which appears after an excessive number of address RRs does not get ignored. I'm not aware of any servers that order the RRs this way, and it may even be forbidden to do so, but I prefer having the callback logic not be order dependent. other than that, the motivation for this change is that the A and AAAA cases were mostly duplicate code that could be combined as a single code path.
* dns response handling: don't treat too many addresses as an errorRich Felker2022-10-191-1/+1
| | | | | | | | | returning -1 rather than 0 from the parse function causes __dns_parse to bail out and return an error. presently, name_from_dns does not check the return value anyway, so this does not matter, but if it ever started treating this as an error, lookups with large numbers of addresses would break. this is a consequence of adding TCP support and extending the buffer size used in name_from_dns.
* dns response handling: ignore presence of wrong-type RRsRich Felker2022-10-191-2/+8
| | | | | | | | | | | | | | | | | | | | | | | reportedly there is nameserver software with question-rewriting "functionality" which gives A answers when AAAA is queried. since we made no effort to validate that the answer RR type actually corresponds to the question asked, it was possible (depending on flags, etc.) for these answers to leak through, which the caller might not be prepared for. indeed, our implementation of gethostbyname2_r makes an assumption that the resulting addresses are in the family requested, and will misinterpret the results if they don't. commit 45ca5d3fcb6f874bf5ba55d0e9651cef68515395 already noted in fixing CVE-2017-15650 that this could happen, but did nothing to validate that the RR type of the answer matches the question; it just enforced the limit on number of results to preclude overflow. presently, name_from_dns ignores the return value of __dns_parse, so it doesn't really matter whether we return 0 (ignoring the RR) or -1 (parse-ending error) upon encountering the mismatched RR. if that ever changes, though, ignoring irrelevant answer RRs sounds like the semantically correct thing to do, so for now let's return 0 from the callback when this happens.
* fix missing synchronization of pthread TSD keys with MT-forkRich Felker2022-10-193-0/+12
| | | | | | | | | | | commit 167390f05564e0a4d3fcb4329377fd7743267560 seems to have overlooked the presence of a lock here, probably because it was one of the exceptions not using LOCK() but a rwlock. as such, it can't be added to the generic table of locks to take, so add an explicit atfork function for the pthread keys table. the order it is called does not particularly matter since nothing else in libc but pthread_exit interacts with keys.
* fgets: avoid arithmetic overflow when n==INT_MIN is passedRich Felker2022-10-191-2/+3
| | | | | | | performing n-- is not a safe operation for arbitrary signed input n. only perform the decrement in the code path where the initial n is greater than 1, and adjust the condition in the n<=1 code path to compensate for it not having been decremented.
* fix AS-safety of close when aio is in use and fd map is expandedRich Felker2022-10-191-0/+6
| | | | | | | | | | | | the aio operations that lead to calling __aio_get_queue with the possibility to expand the fd map are not AS-safe, but if they are interrupted by a signal handler, the signal handler may call close, which is required to be AS-safe. due to __aio_get_queue taking the write lock without blocking signals, such a call to close from a signal handler could deadlock. change __aio_get_queue to block signals if it needs to obtain a write lock, and restore when finished.
* fix use of uninitialized dummy_fut in aio_suspendAlexey Izbyshev2022-10-191-1/+1
| | | | | | | aio_suspend waits on a dummy futex in the corner case when the array of requests contains NULL pointers only. But the value of this futex was left uninitialized, so if it happens to be non-zero, aio_suspend degrades to spinning instead of blocking.
* fix potential deadlock between multithreaded fork and aioRich Felker2022-10-193-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | as reported by Alexey Izbyshev, there is a lock order inversion deadlock between the malloc lock and aio maplock at MT-fork time: _Fork attempts to take the aio maplock while fork already has the malloc lock, but a concurrent aio operation holding the maplock may attempt to allocate memory. move the __aio_atfork calls in the parent from _Fork to fork, and reorder the lock before most other locks, since nothing else depends on aio(*). this leaves us with the possibility that the child will not be able to obtain the read lock, if _Fork is used directly and happens concurrent with an aio operation. however, in that case, the child context is an async signal context that cannot call any further aio functions, so all we need is to ensure that close does not attempt to perform any aio cancellation. this can be achieved just by nulling out the map pointer. (*) even if other functions call close, they will only need a read lock, not a write lock, and read locks being recursive ensures they can obtain it. moreover, the number of read references held is bounded by something like twice the number of live threads, meaning that the read lock count cannot saturate.
* fix potential unsynchronized access to killlock state at thread exitRich Felker2022-10-191-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as reported by Alexey Izbyshev, when the second-to-last thread exits causing a return to single-threaded (no locks needed) state, it creates a situation where the last remaining thread may obtain the killlock that's already held by the exiting thread. this means it may erroneously use the tid of the exiting thread, and may corrupt the lock state due to double-unlock. commit 8d81ba8c0bc6fe31136cb15c9c82ef4c24965040, which (re)introduced the switch back to single-threaded state, documents the intent that the first lock after switching back should provide the necessary synchronization. this is correct, but only works if the switch back is made after there is no further need for synchronization with locks (other than the thread list lock, which can't be bypassed) held by the exiting thread. in order to hit the bug, the remaining thread must first take a different lock, causing it to perform an actual lock one last time, consume the need_locks==-1 state, and transition to need_locks==0. after that, the next attempt to lock the exiting thread's killlock will bypass locking. fix this by reordering the unlocking of killlock at thread exit time, along with changes to the state protected by it, to occur earlier, before the switch to single-threaded state. there are really no constraints on where it's done, except that it occur after there is no longer any possibility of application code executing in the exiting thread, so do it as early as possible.
* fix potential deadlock in dlerror buffer handling at thread exitRich Felker2022-10-193-19/+18
| | | | | | | | | | | | | | | | | | | | | | ever since commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af introduced the thread list lock, this has been wrong. initially, it was wrong via calling free from the context with the thread list lock held. commit aa5a9d15e09851f7b4a1668e9dbde0f6234abada deferred the unsafe free but added a lock, which was also unsafe. in particular, it could deadlock if code holding freebuf_queue_lock was interrupted by a signal handler that takes the thread list lock. commit 4d5aa20a94a2d3fae3e69289dc23ecafbd0c16c4 observed that there was a lock here but failed to notice that it's invalid. there is no easy solution to this problem with locks; any attempt at solving it while still using locks would require the lock to be an AS-safe one (blocking signals on each access to the dlerror buffer list to check if there's deferred free work to be done) which would be excessively costly, and there are also lock order considerations with respect to how the lock would be handled at fork. instead, just use an atomic list.
* configure: disable TBAA optimization because most compilers are buggyRich Felker2022-10-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | | unlike most projects that use -fno-strict-aliasing, we aim to have all sources respect the C language rules for effective type that make type-based alias analysis optimizations possible. unfortunately, it turns out that there are deep, and likely very difficult to fix, flaws in the TBAA performed by GCC and likely other compilers, whereby this kind of optimization can transform code that follows the rules strictly in ways that will make it malfunction. see for example GCC bugs 107107 and 107115, the latter of which also affects clang. there are not presently any known instances of breakage due to wrong type-based aliasing optimizations in our codebase. nonetheless, since the transformations are unsound and could introduce breakage, configure CFLAGS to build with -fno-strict-aliasing. some casual analysis of the effects on codegen suggest that this is unlikely to affect performance except possibly in the regex engine. in general, we should probably prefer making better use of the restrict keyword over relying on types to imply non-aliasing for optimization purposes; doing so should be able to get back any performance that was lost and more, should it turn out to matter (unlikely).
* disable MADV_FREE usage in mallocngRich Felker2022-10-192-1/+3
| | | | | | | | | | | | | | | | | | | | | | | the entire intent of using madvise/MADV_FREE on freed slots is to improve system performance by avoiding evicting cache of useful data, or swapping useless data to disk, by marking any whole pages in the freed slot as discardable by the kernel. in particular, unlike unmapping the memory or replacing it with a PROT_NONE region, use of MADV_FREE does not make any difference to memory accounting for commit charge purposes, and so does not increase the memory available to other processes in a non-overcommitted environment. however, various measurements have shown that inordinate amounts of time are spent performing madvise syscalls in processes which frequently allocate and free medium sized objects in the size range roughly between PAGESIZE and MMAP_THRESHOLD, to the point that the net effect is almost surely significant performance degredation. so, turn it off. the code, which has some nontrivial logic for efficiently determining whether there is a whole-page range to apply madvise to, is left in place so that it can easily be re-enabled if desired, or later tuned to only apply to certain sizes or to use additional heuristics.
* remove LFS64 programming interfaces (macro-only) from _GNU_SOURCERich Felker2022-10-1916-16/+16
| | | | | | | | | | | | | | | | | these badly pollute the namespace with macros whenever _GNU_SOURCE is defined, which is always the case with g++, and especially tends to interfere with C++ constructs. as our implementation of these was macro-only, their removal cannot affect any existing binaries. at the source level, portable software should be prepared for them not to exist. for now, they are left in place with explicit _LARGEFILE64_SOURCE. this provides an easy temporary path for integrators/distributions to get packages building again right away if they break while working on a proper, upstreamable fix. the intent is that this be a very short-term measure and that the macros be removed entirely in the next release cycle.
* remove LFS64 symbol aliases; replace with dynamic linker remappingRich Felker2022-10-1957-135/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
* dns query core: detect udp truncation at recv timeRich Felker2022-10-191-4/+13
| | | | | | | | | | | | | | | | | | we already attempt to preclude this case by having res_send use a sufficiently large temporary buffer even if the caller did not provide one as large as or larger than the udp dns max of 512 bytes. however, it's possible that the caller passed a custom-crafted query packet using EDNS0, e.g. to get detailed DNSSEC results, with a larger udp size allowance. I have also seen claims that there are some broken nameservers in the wild that do not honor the dns udp limit of 512 and send large answers without the TC bit set, when the query was not using EDNS. we generally don't aim to support broken nameservers, but in this case both problems, if the latter is even real, have a common solution: using recvmsg instead of recvfrom so we can examine the MSG_TRUNC flag.
* getaddrinfo dns lookup: use larger answer buffer to handle long CNAMEsRich Felker2022-10-191-3/+5
| | | | | | | | | | | | | | | | | | | | | | | the size of 512 is not sufficient to get at least one address in the worst case where the name is at or near max length and resolves to a CNAME at or near max length. prior to tcp fallback, there was nothing we could do about this case anyway, but now it's fixable. the new limit 768 is chosen so as to admit roughly the number of addresses with a worst-case CNAME as could fit for a worst-case name that's not a CNAME in the old 512-byte limit. outside of this worst-case, the number of addresses that might be obtained is increased. MAXADDRS (48) was originally chosen as an upper bound on the combined number of A and AAAA records that could fit in 512-byte packets (31 and 17, respectively). it is not increased at this time. so as to prevent a situation where the A records consume almost all of these slots (at 768 bytes, a "best-case" name can fit almost 47 A records), the order of parsing is swapped to process AAAA first. this ensures roughly half of the slots are available to each address family.
* arpa/nameser.h: update RR types listRich Felker2022-09-221-0/+71
| | | | | our RR type list in arpa/nameser.h was badly outdated, and missing important types for DNSSEC and DANE use, among other things.
* dns: implement tcp fallback in __res_msend query coreRich Felker2022-09-221-2/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp fallback was originally deemed unwanted and unnecessary, since we aim to return a bounded-size result from getaddrinfo anyway and normally plenty of address records fit in the 512-byte udp dns limit. however, this turned out to have several problems: - some recursive nameservers truncate by omitting all the answers, rather than sending as many as can fit. - a pathological worst-case CNAME for a worst-case name can fill the entire 512-byte space with just the two names, leaving no room for any addresses. - the res_* family of interfaces allow querying of non-address records such as TLSA (DANE), TXT, etc. which can be very large. for many of these, it's critical that the caller see the whole RRset. also, res_send/res_query are specified to return the complete, untruncated length so that the caller can retry with an appropriately-sized buffer. determining this is not possible without tcp. so, it's time to add tcp fallback. the fallback strategy implemented here uses one tcp socket per question (1 or 2 questions), initiated via tcp fastopen when possible. the connection is made to the nameserver that issued the truncated answer. right now, fallback happens unconditionally when truncation is seen. this can, and may later be, relaxed for queries made by the getaddrinfo system, since it will only use a bounded number of results anyway. retry is not attempted again after failure over tcp. the logic could easily be adapted to do that, but it's of questionable value, since the tcp stack automatically handles retransmission and the successs answer with TC=1 over udp strongly suggests that the nameserver has the full answer ready to give. further retry is likely just "take longer to fail".
* res_send: use a temp buffer if caller's buffer is under 512 bytesRich Felker2022-09-221-1/+9
| | | | | | | | | | | | | | | for extremely small buffer sizes, the DNS query core in __res_msend may malfunction completely, being unable to get even the headers to determine the response code. but there is also a problem for reasonable sizes under 512 bytes: __res_msend is unable to determine if the udp answer was truncated at the recv layer, in which case it may be incomplete, and res_send is then unable to honor its contract to return the length of the full, non-truncated answer. at present, res_send does not honor that contract anyway when the full answer would exceed 512 bytes, since there is no tcp fallback, but this change at least makes it consistent in a context where this is the only "full answer" to be had.
* adapt res_msend DNS query core for working with multiple socketsRich Felker2022-09-211-6/+11
| | | | | this is groundwork for TCP fallback support, but does not itself change behavior in any way.
* getaddrinfo: add EAI_NODATA error code to distinguish NODATA vs NxDomainRich Felker2022-09-206-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this was apparently omitted long ago out of a lack of understanding of its importance and the fact that POSIX doesn't specify it. despite not being officially standardized, however, it turns out that at least AIX, glibc, NetBSD, OpenBSD, QNX, and Solaris document and support it. in certain usage cases, such as implementing a DNS gateway on top of the stub resolver interfaces, it's necessary to distinguish the case where a name does not exit (NxDomain) from one where it exists but has no addresses (or other records) of the requested type (NODATA). in fact, even the legacy gethostbyname API had this distinction, which we were previously unable to support correctly because the backend lacked it. apart from fixing an important functionality gap, adding this distinction helps clarify to users how search domain fallback works (falling back in cases corresponding to EAI_NONAME, not in ones corresponding to EAI_NODATA), a topic that has been a source of ongoing confusion and frustration. as a result of this change, EAI_NONAME is no longer a valid universal error code for getaddrinfo in the case where AI_ADDRCONFIG has suppressed use of all address families. in order to return an accurate result in this case, getaddrinfo is modified to still perform at least one lookup. this will almost surely fail (with a network error, since there is no v4 or v6 network to query DNS over) unless a result comes from the hosts file or from ip literal parsing, but in case it does succeed, the result is replaced by EAI_NODATA. glibc has a related error code, EAI_ADDRFAMILY, that could be used for the AI_ADDRCONFIG case and certain NODATA cases, but distinguishing them properly in full generality seems to require additional DNS queries that are otherwise not useful. on glibc, it is only used for ip literals with mismatching family, not for DNS or hosts file results where the name has addresses only in the opposite family. since this seems misleading and inconsistent, and since EAI_NODATA already covers the semantic case where the "name" exists but doesn't have any addresses in the requested family, we do not adopt EAI_ADDRFAMILY at this time. this could be changed at some point if desired, but the logic for getting all the corner cases with AI_ADDRCONFIG right is slightly nontrivial.
* fix error cases in gethostbyaddr_rRich Felker2022-09-191-2/+3
| | | | | | | EAI_MEMORY is not possible (but would not provide errno if it were) and EAI_FAIL does not provide errno. treat the latter as EBADMSG to match how it's handled in gethostbyname2_r (it indicates erroneous or failure response from the nameserver).
* remove impossible error case from gethostbyname2_rRich Felker2022-09-191-1/+0
| | | | | | | | EAI_MEMORY is not possible because the resolver backend does not allocate. if it did, it would be necessary for us to explicitly return ENOMEM as the error, since errno is not guaranteed to reflect the error cause except in the case of EAI_SYSTEM, so the existing code was not correct anyway.
* fix return value of gethostnbyname[2]_r on result not foundRich Felker2022-09-191-1/+1
| | | | | | | | these functions are horribly underspecified, inconsistent between historical systems, and should never have been included. however, the signatures we have match the glibc ones, and the glibc behavior is to treat NxDomain and NODATA results as a success condition, not an ENOENT error.
* dns: treat names rejected by res_mkquery as nonexistent rather than errorRich Felker2022-09-191-1/+1
| | | | | | | | | | | | | this distinction only affects search, but allows search to continue when concatenating one of the search domains onto the requested name produces a result that's not valid. this can happen when the concatenation is too long, or one of the search list entries is itself not valid. as a consequence of this change, having "." in the search domains list will now be ignored/skipped rather than making the lookup abort with no results (due to producing a concatenation ending in ".."). this behavior could be changed later if needed.
* res_mkquery: error out on consecutive final dots in nameRich Felker2022-09-191-0/+1
| | | | | | | | | | | | | | | the main loop already errors out on zero-length labels within the name, but terminates before having a chance to check for an erroneous final zero-length label, instead producing a malformed query packet with a '.' byte instead of the terminating zero. rather than poke at the look logic, simply detect this condition early and error out without doing anything. this also fixes behavior of getaddrinfo when "." appears in the search domain list, which produces a name ending in ".." after concatenation, at least in the sense of no longer emitting malformed packets on the network. however, due to other issues, the lookup will still fail.
* fix thread leak on timer_create(SIGEV_THREAD) failureAlexey Izbyshev2022-09-191-1/+5
| | | | | | | | | | | | After commit 5b74eed3b301e2227385f3bf26d3bb7c2d822cf8 the timer thread doesn't check whether timer_create() actually created the timer, proceeding to wait for a signal that might never arrive. We can't fix this by simply checking for a negative timer_id after pthread_barrier_wait() because we have no way to distinguish a timer creation failure and a request to delete a timer with INT_MAX id if it happens to arrive quickly (a variation of this bug existed before 5b74eed3b301e2227385f3bf26d3bb7c2d822cf8, where the timer would be leaked in this case). So (ab)use cancel field of pthread_t instead.
* re-enable vdso clock_gettime on arm (32-bit) with workaroundRich Felker2022-09-192-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4486c579cbf0d989080705f515d08cb48636ba88 disabled vdso clock_gettime on arm due to a Linux kernel bug that was not understood at the time, whereby the vdso function silently produced catastrophically wrong results on some systems. since then, the bug was tracked down to the way the arm kernel disabled use of vdso clock_gettime on kernels where the necessary timer was not available or was disabled. it simply patched out the symbols, but it only did this for the legacy time32 functions, and left the time64 function in place but non-operational. kernel commit 4405bdf3c57ec28d606bdf5325f1167505bfdcd4 (first present in 5.8) provided the fix. if this were a bug that impacted all users of the broken kernel versions, we could probably ignore it and assume it had been patched or replaced. however, it's very possible that these kernels appear in the wild in devices running time32 userspace (glibc, musl 1.1.x, or some other environment) where they appear to work fine, but where our new binaries would fail catastrophically if we used the time64 vdso function. since the kernel has not (yet?) given us a way to probe for the working time64 vdso function semantically, we work around the problem by refusing to use the time64 one unless the time32 one is also present. this will revert to not using vdso at all if the time32 one is ever removed, but at least that's safe against wrong results and is just a missed optimization.
* process DT_RELR relocations in ldso-startup/static-pieRich Felker2022-09-121-0/+15
| | | | | | | | | | | commit d32dadd60efb9d3b255351a3b532f8e4c3dd0db1 added DT_RELR processing for programs and shared libraries processed by the dynamic linker, but left them unsupported in the dynamic linker itseld and in static pie binaries, which self-relocate via code in dlstart.c. add the equivalent processing to this code path so that there are not arbitrary restrictions on where the new packed relative relocation form can be used.
* fix fwprintf missing output to open_wmemstream FILEsRich Felker2022-09-071-1/+5
| | | | | | | | | | | | | | | open_wmemstream's write method was written assuming no buffering, since it sets the FILE up with buf_len of zero in order to avoid issues with position/seeking. however, as a consequence of commit bd57e2b43a5b56c00a82adbde0e33e5820c81164, a FILE being written to by the printf core has a temporary local buffer for the duration of the operation if it was unbuffered to begin with. since this was disregarded by the wide memstream's write method, output produced through this code path, particularly numeric fields, was missing from the output wchar buffer. copy the equivalent logic for using the buffered data from the byte-oriented open_memstream.
* dns: fail if ipv6 is disabled and resolv.conf has only v6 nameservesRich Felker2022-08-261-0/+5
| | | | | | | | | | | | | | if resolv.conf lists no nameservers at all, the default of 127.0.0.1 is used. however, another "no nameservers" case arises where the system has ipv6 support disabled/configured-out and resolv.conf only contains v6 nameservers. this caused the resolver to repeat socket operations that will necessarily fail (sending to one or more wrong-family addresses) while waiting for a timeout. it would be contrary to configured intent to query 127.0.0.1 in this case, but the current behavior is not conducive to diagnosing the configuration problem. instead, fail immediately with EAI_SYSTEM and errno==EAFNOSUPPORT so that the configuration error is reportable.
* use kernel-provided AT_MINSIGSTKSZ for sysconf(_SC_[MIN]SIGSTKSZ)Rich Felker2022-08-261-2/+12
| | | | | | | | | | | | use the legacy constant values if the kernel does not provide AT_MINSIGSTKSZ (__getauxval will return 0 in this case) and as a safety check if something is wrong and the provided value is less than the legacy constant. sysconf(_SC_SIGSTKSZ) returns SIGSTKSZ adjusted for the difference between the legacy constant MINSIGSTKSZ and the runtime value, so that the working space the application has on top of the minimum remains invariant under changes to the minimum.
* add sysconf keys/values for signal stack sizeRich Felker2022-08-262-0/+5
| | | | | | | | | | | | | | | | as a result of ISA extensions exploding register file sizes on some archs, using a constant for minimum signal stack size no longer seems viably future-proof. add sysconf keys allowing the kernel to provide a machine-dependent minimum applications can query to ensure they allocate sufficient space for stacks. the key names and indices align with the same functionality in glibc. see commit d5a5045382315e36588ca225889baa36ed0ed38f for previous action on this subject. ultimately, the macros MINSIGSTKSZ and SIGSTKSZ probably need to be deprecated, but that is standards-amendment work outside the scope of a single implementation.
* fix fallback when ipv6 is disabled but resolv.conf has v6 nameservesRich Felker2022-08-241-1/+2
| | | | | | | | | | | | apparently this code path was never tested, as it's not usual to have v6 nameservers listed on a system without v6 networking support. but it was always intended to work. when reverting to binding a v4 address, also revert the family in the sockaddr structure and the socklen for it. otherwise bind will just fail due to mismatched family/sockaddr size. fix dns resolver fallback when v6 nameservers are listed by
* epoll_create: fail with EINVAL if size is non-positiveKristina Martsenko2022-08-241-0/+1
| | | | | | This is a part of the interface contract defined in the Linux man page (official for a Linux-specific interface) and asserted by test cases in the Linux Test Project (LTP).