about summary refs log tree commit diff
path: root/sysdeps
Commit message (Collapse)AuthorAgeFilesLines
* aarch64: Add vector implementations of exp2 routinesJoe Ramsay2023-10-2313-0/+463
| | | | Some routines reuse table from v_exp_data.c
* aarch64: Add vector implementations of tan routinesJoe Ramsay2023-10-2319-1/+1248
| | | | | This includes some utility headers for evaluating polynomials using various schemes.
* tst-spawn-cgroup.c: Fix argument order of UNSUPPORTED message.Stefan Liebler2023-10-201-3/+3
| | | | | | | The arguments for "expected" and "got" are mismatched. Furthermore this patch is dumping both values as hex. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>
* s390: Fix undefined behaviour in feenableexcept, fedisableexcept [BZ #30960]Stefan Liebler2023-10-192-2/+4
| | | | | | | | | | | | | | | | | If feenableexcept or fedisableexcept gets excepts=FE_INVALID=0x80 as input, we have a signed left shift: 0x80 << 24 which is not representable as int and thus is undefined behaviour according to C standard. This patch casts excepts as unsigned int before shifting, which is defined. For me, the observed undefined behaviour is that the shift is done with "unsigned"-instructions, which is exactly what we want. Furthermore, I don't get any exception-flags. After the fix, the code is using the same instruction sequence as before.
* Revert "elf: Always call destructors in reverse constructor order (bug 30785)"Florian Weimer2023-10-181-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6985865bc3ad5b23147ee73466583dd7fdf65892. Reason for revert: The commit changes the order of ELF destructor calls too much relative to what applications expect or can handle. In particular, during process exit and _dl_fini, after the revert commit, we no longer call the destructors of the main program first; that only happens after some dlopen'ed objects have been destructed. This robs applications of an opportunity to influence destructor order by calling dlclose explicitly from the main program's ELF destructors. A couple of different approaches involving reverse constructor order were tried, and none of them worked really well. It seems we need to keep the dependency sorting in _dl_fini. There is also an ambiguity regarding nested dlopen calls from ELF constructors: Should those destructors run before or after the object that called dlopen? Commit 6985865bc3ad5b2314 used reverse order of the start of ELF constructor calls for destructors, but arguably using completion of constructors is more correct. However, that alone is not sufficient to address application compatibility issues (it does not change _dl_fini ordering at all).
* Add LE DSCP code point from RFC-8622.Bruno Victal2023-10-171-0/+5
| | | | | Signed-off-by: Bruno Victal <mirai@makinata.eu> Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.hJoseph Myers2023-10-171-0/+1
| | | | | | | Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to glibc's bits/hwcap.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.
* Add SCM_SECURITY, SCM_PIDFD to bits/socket.hJoseph Myers2023-10-161-0/+4
| | | | | | | | | | | | Linux 6.5 adds a constant SCM_PIDFD (recall that the non-uapi linux/socket.h, where this constant is added, is in fact a header providing many constants that are part of the kernel/userspace interface). This shows up that SCM_SECURITY, from the same set of definitions and added in Linux 2.6.17, is also missing from glibc, although glibc has the first two constants from this set, SCM_RIGHTS and SCM_CREDENTIALS; add both missing constants to glibc. Tested for x86_64.
* Add AT_HANDLE_FID from Linux 6.5 to bits/fcntl-linux.hJoseph Myers2023-10-161-0/+11
| | | | | | | | | Linux 6.5 adds a constant AT_HANDLE_FID; add it to glibc. Because this is a flag for the function name_to_handle_at declared in bits/fcntl-linux.h, put the flag there rather than alongside other AT_* flags in (OS-independent) fcntl.h. Tested for x86_64.
* Avoid maybe-uninitialized warning in __kernel_rem_pio2Andreas Schwab2023-10-161-9/+5
| | | | | | | | | | | | | | With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized warning: ../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2': ../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized] 364 | y[0] = fq[0]; y[1] = fq[1]; y[2] = fw; | ~~^~~ This is similar to the warning that is suppressed in the other branch of the switch. Help the compiler knowing that the variable is always initialized, which also makes the suppression obsolete.
* x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10Noah Goldstein2023-10-063-569/+293
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit refactors `strrchr-evex` and `strrchr-evex512` to use a common implementation: `strrchr-evex-base.S`. The motivation is `strrchr-evex` needed to be refactored to not use 64-bit masked registers in preperation for AVX10. Once vec-width masked register combining was removed, the EVEX and EVEX512 implementations can easily be implemented in the same file without any major overhead. The net result is performance improvements (measured on TGL) for both `strrchr-evex` and `strrchr-evex512`. Although, note there are some regressions in the test suite and it may be many of the cases that make the total-geomean of improvement/regression across bench-strrchr are cold. The point of the performance measurement is to show there are no major regressions, but the primary motivation is preperation for AVX10. Benchmarks where taken on TGL: https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74 EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87 Full check passes on x86.
* aarch64: Optimise vecmath logsJoe Ramsay2023-10-057-215/+226
| | | | | | | | | | | * Transpose table layout for improved memory access * Use half-vector special comparisons for AdvSIMD * Improve register use near special-case branches - Due to the presence of a function call, return value would get mov-d out of x0 in order to facilitate PCS. By moving the final computation after the branch this can be avoided Also change SVE routines to use overloaded intrinsics for readability.
* aarch64: Cosmetic change in SVE exp routinesJoe Ramsay2023-10-052-47/+44
| | | | | | | Use overloaded intrinsics for readability. Codegen does not change, however while we're bringing the routines up-to-date with recent improvements to other routines in AOR it is worth copying this change over as well.
* aarch64: Optimize SVE cos & cosfJoe Ramsay2023-10-052-53/+47
| | | | | | Saves a mov by ensuring return value does not need to be moved out of the way before special-case branch. Also change to use overloaded intrinsics.
* aarch64: Improve vecmath sin routinesJoe Ramsay2023-10-053-73/+87
| | | | | | | | * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions * Improve register use near special-case branch Also use overloaded intrinsics for SVE.
* Fix FORTIFY_SOURCE false positiveVolker Weißmann2023-10-041-1/+3
| | | | | | | | | | | | | | | | When -D_FORTIFY_SOURCE=2 was given during compilation, sprintf and similar functions will check if their first argument is in read-only memory and exit with *** %n in writable segment detected *** otherwise. To check if the memory is read-only, glibc reads frpm the file "/proc/self/maps". If opening this file fails due to too many open files (EMFILE), glibc will now ignore this error. Fixes [BZ #30932] Signed-off-by: Volker Weißmann <volker.weissmann@gmx.de> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Propagate GLIBC_TUNABLES in setxid binariesSiddhesh Poyarekar2023-10-021-1/+0
| | | | | | | | | | GLIBC_TUNABLES scrubbing happens earlier than envvar scrubbing and some tunables are required to propagate past setxid boundary, like their env_alias. Rely on tunable scrubbing to clean out GLIBC_TUNABLES like before, restoring behaviour in glibc 2.37 and earlier. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: add ST_NOSYMFOLLOWKir Kolyshkin2023-10-021-1/+3
| | | | | | | | | | Linux v5.10 added a mount option MS_NOSYMFOLLOW, which was added to glibc in commit 0ca21427d950755b. Add the corresponding statfs/statvfs flag bit, ST_NOSYMFOLLOW. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* mips: dl-machine-reject-phdr: Get rid of alloca.Joe Simmons-Talbott2023-10-021-16/+10
| | | | | | | | | | Read directly into the mips_abiflags struct rather than reading the entire segment and using alloca when the passed buffer is not big enough. Checked with build-many-glibcs.py on mips-linux-gnu Tested-by: Ying Huang <ying.huang@oss.cipunited.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* x86: Add support for AVX10 preset and vec size in cpu-featuresNoah Goldstein2023-09-294-3/+71
| | | | | | | | | | | | This commit add support for the new AVX10 cpu features: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf We add checks for: - `AVX10`: Check if AVX10 is present. - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support. `make check` passes and cpuid output was checked against GNR/DMR on an emulator.
* hurd: Drop REG_GSFS and REG_ESDS from x86_64's ucontextSamuel Thibault2023-09-284-16/+8
| | | | These are useless on x86_64, and __NGREG was actually wrong with them.
* fegetenv_and_set_rn now uses the builtins provided by GCC.Manjunath Matti2023-09-271-0/+9
| | | | | | | | | | | | | | | | | | | | On powerpc, SET_RESTORE_ROUND uses inline assembly to optimize the prologue get/save/set rounding mode operations for POWER9 and later by using 'mffscrn' where possible, this was introduced by commit f1c56cdff09f650ad721fae026eb6a3651631f3d. GCC version 14 onwards supports builtins as __builtin_set_fpscr_rn which now returns the FPSCR fields in a double. This feature is available on Power9 when the __SET_FPSCR_RN_RETURNS_FPSCR__ macro is defined. GCC commit ef3bbc69d15707e4db6e2f198c621effb636cc26 adds this feature. Changes are done to use __builtin_set_fpscr_rn instead of mffscrn or mffscrni in __fe_mffscrn(rn). Suggested-by: Carl Love <cel@us.ibm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* io: Do not implement fstat with fstatatAdhemerval Zanella2023-09-273-12/+68
| | | | | | | | | | | | | | | | AT_EMPTY_PATH is a requirement to implement fstat over fstatat, however it does not prevent the kernel to read the path argument. It is not an issue, but on x86-64 with SMAP-capable CPUs the kernel is forced to perform expensive user memory access. After that regular lookup is performed which adds even more overhead. Instead, issue the fstat syscall directly on LFS fstat implementation (32 bit architectures will still continue to use statx, which is required to have 64 bit time_t support). it should be even a small performance gain on non x86_64, since there is no need to handle the path argument. Checked on x86_64-linux-gnu.
* AArch64: Remove -0.0 check from vector sinWilco Dijkstra2023-09-262-12/+2
| | | | | | | Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf, improving performance. Passes regress. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* elf: Add dummy declaration of _dl_audit_objclose for !SHAREDFlorian Weimer2023-09-261-1/+8
| | | | | | This allows us to avoid some #ifdef SHARED conditionals. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843]Romain Geissler2023-09-251-3/+1
| | | | | | This patch fixes a very recently added leak in getaddrinfo. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Revert "LoongArch: Add glibc.cpu.hwcap support."caiyinyu2023-09-2111-329/+8
| | | | This reverts commit a53451559dc9cce765ea5bcbb92c4007e058e92b.
* Update kernel version to 6.5 in header constant testsJoseph Myers2023-09-202-2/+2
| | | | | | | | | | This patch updates the kernel version in the tests tst-mman-consts.py and tst-pidfd-consts.py to 6.5. (There are no new constants covered by these tests in 6.5 that need any other header changes; tst-mount-consts.py was updated separately along with a header constant addition.) Tested with build-many-glibcs.py.
* LoongArch: Add glibc.cpu.hwcap support.caiyinyu2023-09-1911-8/+329
| | | | | | | | | | | | | | | | | | | | | | | | | Key Points: 1. On lasx & lsx platforms, We must use _dl_runtime_{profile, resolve}_{lsx, lasx} to save vector registers. 2. Via "tunables", users can choose str/mem_{lasx,lsx,unaligned} functions with `export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,...`. Note: glibc.cpu.hwcaps doesn't affect _dl_runtime_{profile, resolve}_{lsx, lasx} selection. Usage Notes: 1. Only valid inputs: LASX, LSX, UAL. Case-sensitive, comma-separated, no spaces. 2. Example: `export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL` turns on LASX & UAL. Unmentioned features turn off. With default ifunc: lasx > lsx > unaligned > aligned > generic, effect is: lasx > unaligned > aligned > generic; lsx off. 3. Incorrect GLIBC_TUNABLES settings will show error messages. For example: On lsx platforms, you cannot enable lasx features. If you do that, you will get error messages. 4. Valid input examples: - GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX: lasx > aligned > generic. - GLIBC_TUNABLES=glibc.cpu.hwcaps=LSX,UAL: lsx > unaligned > aligned > generic. - GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL,LASX,UAL,LSX,LASX,UAL: Repetitions allowed but not recommended. Results in: lasx > lsx > unaligned > aligned > generic.
* getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)Siddhesh Poyarekar2023-09-151-8/+17
| | | | | | | | | | | | | | | | | | | | | | | When an NSS plugin only implements the _gethostbyname2_r and _getcanonname_r callbacks, getaddrinfo could use memory that was freed during tmpbuf resizing, through h_name in a previous query response. The backing store for res->at->name when doing a query with gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in gethosts during the query. For AF_INET6 lookup with AI_ALL | AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second for a v4 lookup. In this case, if the first call reallocates tmpbuf enough number of times, resulting in a malloc, th->h_name (that res->at->name refers to) ends up on a heap allocated storage in tmpbuf. Now if the second call to gethosts also causes the plugin callback to return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF reference in res->at->name. This then gets dereferenced in the getcanonname_r plugin call, resulting in the use after free. Fix this by copying h_name over and freeing it at the end. This resolves BZ #30843, which is assigned CVE-2023-4806. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* LoongArch: Change to put magic number to .rodata sectiondengjianbo2023-09-151-10/+10
| | | | | Change to put magic number to .rodata section in memmove-lsx, and use pcalau12i and %pc_lo12 with vld to get the data.
* LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}dengjianbo2023-09-157-0/+578
| | | | | | | | | | | | | | | | According to glibc strrchr microbenchmark test results, this implementation could reduce the runtime time as following: Name Percent of rutime reduced strrchr-lasx 10%-50% strrchr-lsx 0%-50% strrchr-aligned 5%-50% Generic strrchr is implemented by function strlen + memrchr, the lasx version will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx, the lsx version will compare with generic strrchr implemented by strlen-lsx + memrchr-lsx, the aligned version will compare with generic strrchr implemented by strlen-aligned + memrchr-generic.
* LoongArch: Add ifunc support for strcpy, stpcpy{aligned, unaligned, lsx, lasx}dengjianbo2023-09-1512-0/+963
| | | | | | | | | | | | | | | | | | | | According to glibc strcpy and stpcpy microbenchmark test results(changed to use generic_strcpy and generic_stpcpy instead of strlen + memcpy), comparing with the generic version, this implementation could reduce the runtime as following: Name Percent of rutime reduced strcpy-aligned 8%-45% strcpy-unaligned 8%-48%, comparing with the aligned version, unaligned version takes less instructions to copy the tail of data which length is less than 8. it also has better performance in case src and dest cannot be both aligned with 8bytes strcpy-lsx 20%-80% strcpy-lasx 15%-86% stpcpy-aligned 6%-43% stpcpy-unaligned 8%-48% stpcpy-lsx 10%-80% stpcpy-lasx 10%-87%
* LoongArch: Replace deprecated $v0 with $a0 to eliminate 'as' Warnings.caiyinyu2023-09-151-1/+1
|
* LoongArch: Add lasx/lsx support for _dl_runtime_profile.caiyinyu2023-09-157-179/+331
|
* Add MOVE_MOUNT_BENEATH from Linux 6.5 to sys/mount.hJoseph Myers2023-09-142-2/+3
| | | | | | | | This patch adds the MOVE_MOUNT_BENEATH constant from Linux 6.5 to glibc's sys/mount.h and updates tst-mount-consts.py to reflect these constants being up to date with that Linux kernel version. Tested with build-many-glibcs.py.
* Update syscall lists for Linux 6.5Joseph Myers2023-09-1228-2/+31
| | | | | | | | Linux 6.5 has one new syscall, cachestat, and also enables the cacheflush syscall for hppa. Update syscall-names.list and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.
* ia64: Work around miscompilation and fix build on ia64's gcc-10 and laterSergei Trofimovich2023-09-111-1/+3
| | | | | | | | | | | | | Needed since gcc-10 enabled -fno-common by default. [In use in Gentoo since gcc-10, no problems observed. Also discussed with and reviewed by Jessica Clarke from Debian. Andreas] Bug: https://bugs.gentoo.org/723268 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Sergei Trofimovich <slyich@gmail.com> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
* stdio: Remove __libc_message alloca usageJoe Simmons-Talbott2023-09-111-34/+13
| | | | | | | | Use a fixed size array instead. The maximum number of arguments is set by macro tricks. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* htl: avoid exposing the vm_region symbolSamuel Thibault2023-09-091-1/+1
|
* elf: Always call destructors in reverse constructor order (bug 30785)Florian Weimer2023-09-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of dlclose (and process exit) re-sorts the link maps before calling ELF destructors. Destructor order is not the reverse of the constructor order as a result: The second sort takes relocation dependencies into account, and other differences can result from ambiguous inputs, such as cycles. (The force_first handling in _dl_sort_maps is not effective for dlclose.) After the changes in this commit, there is still a required difference due to dlopen/dlclose ordering by the application, but the previous discrepancies went beyond that. A new global (namespace-spanning) list of link maps, _dl_init_called_list, is updated right before ELF constructors are called from _dl_init. In dl_close_worker, the maps variable, an on-stack variable length array, is eliminated. (VLAs are problematic, and dlclose should not call malloc because it cannot readily deal with malloc failure.) Marking still-used objects uses the namespace list directly, with next and next_idx replacing the done_index variable. After marking, _dl_init_called_list is used to call the destructors of now-unused maps in reverse destructor order. These destructors can call dlopen. Previously, new objects do not have l_map_used set. This had to change: There is no copy of the link map list anymore, so processing would cover newly opened (and unmarked) mappings, unloading them. Now, _dl_init (indirectly) sets l_map_used, too. (dlclose is handled by the existing reentrancy guard.) After _dl_init_called_list traversal, two more loops follow. The processing order changes to the original link map order in the namespace. Previously, dependency order was used. The difference should not matter because relocation dependencies could already reorder link maps in the old code. The changes to _dl_fini remove the sorting step and replace it with a traversal of _dl_init_called_list. The l_direct_opencount decrement outside the loader lock is removed because it appears incorrect: the counter manipulation could race with other dynamic loader operations. tst-audit23 needs adjustments to the changes in LA_ACT_DELETE notifications. The new approach for checking la_activity should make it clearer that la_activty calls come in pairs around namespace updates. The dependency sorting test cases need updates because the destructor order is always the opposite order of constructor order, even with relocation dependencies or cycles present. There is a future cleanup opportunity to remove the now-constant force_first and for_fini arguments from the _dl_sort_maps function. Fixes commit 1df71d32fe5f5905ffd5d100e5e9ca8ad62 ("elf: Implement force_first handling in _dl_sort_maps_dfs (bug 28937)"). Reviewed-by: DJ Delorie <dj@redhat.com>
* io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64Aurelien Jarno2023-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5f828ff824e3b7cd1 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for powerpc64") fixed an issue with the value of the lock constants on powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also changing the value when using __USE_FILE_OFFSET64 causing an API change. Fix that by also checking that define, restoring the pre 4d0fe291aed3a476a commit values: Default values: - F_GETLK: 5 - F_SETLK: 6 - F_SETLKW: 7 With -D_FILE_OFFSET_BITS=64: - F_GETLK: 12 - F_SETLK: 13 - F_SETLKW: 14 At the same time, it has been noticed that there was no test for io lock with __USE_FILE_OFFSET64, so just add one. Tested on x86_64-linux-gnu, i686-linux-gnu and powerpc64le-unknown-linux-gnu. Resolves: BZ #30804. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* getaddrinfo: Get rid of allocaJoe Simmons-Talbott2023-09-061-15/+9
| | | | | Use a scratch_buffer rather than alloca to avoid potential stack overflow.
* riscv: Add support for XTheadBb in string-fz[a,i].hChristoph Müllner2023-09-062-2/+7
| | | | | | | | | | | | | | | | | XTheadBb has similar instructions like Zbb, which allow optimized string processing: * th.ff0: find-first zero is a CLZ instruction. * th.tstnbz: Similar like orc.b, but with a bit-inverted result. The instructions are documented here: https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb These instructions can be found in the T-Head C906 and the C910. Tested with the string tests. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* getcanonname: Fix a typoSiddhesh Poyarekar2023-09-051-1/+1
| | | | | | | | This code is generally unused in practice since there don't seem to be any NSS modules that only implement _nss_MOD_gethostbyname2_r and not _nss_MOD_gethostbyname3_r. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Add pidfd_getpidAdhemerval Zanella Netto2023-09-0542-0/+478
| | | | | | | | | | | | | | | | | | | | | | | | This interface allows to obtain the associated process ID from the process file descriptor. It is done by parsing the procps fdinfo information. Its prototype is: pid_t pidfd_getpid (int fd) It returns the associated pid or -1 in case of an error and sets the errno accordingly. The possible errno values are those from open, read, and close (used on procps parsing), along with: - EBADF if the FD is negative, does not have a PID associated, or if the fdinfo fields contain a value larger than pid_t. - EREMOTE if the PID is in a separate namespace. - ESRCH if the process is already terminated. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)Adhemerval Zanella Netto2023-09-0551-3/+499
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Returning a pidfd allows a process to keep a race-free handle for a child process, otherwise, the caller will need to either use pidfd_open (which still might be subject to TOCTOU) or keep the old racy interface base on pid_t. To correct use pifd_spawn, the kernel must support not only returning the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4). If kernel does not support the waitid, pidfd return ENOSYS. It avoids the need to racy workarounds, such as reading the procfs fdinfo to get the pid to use along with other wait interfaces. These interfaces are similar to the posix_spawn and posix_spawnp, with the only difference being it returns a process file descriptor (int) instead of a process ID (pid_t). Their prototypes are: int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); A new symbol is used instead of a posix_spawn extension to avoid possible issues with language bindings that might track the return argument lifetime. Although on Linux pid_t and int are interchangeable, POSIX only states that pid_t should be a signed integer. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It also means that both interfaces support the same attribute and file actions, and a new flag or file action on posix_spawn is also added automatically for pidfd_spawn. Also, using posix_spawn plumbing allows the reusing of most of the current testing with some changes: - waitid is used instead of waitpid since it is a more generic interface. - tst-posix_spawn-setsid.c is adapted to take into consideration that the caller can check for session id directly. The test now spawns itself and writes the session id as a file instead. - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an extra file description unused. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26371)Adhemerval Zanella Netto2023-09-0541-3/+414
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions allow to posix_spawn and posix_spawnp to use CLONE_INTO_CGROUP with clone3, allowing the child process to be created in a different cgroup version 2. These are GNU extensions that are available only for Linux, and also only for the architectures that implement clone3 wrapper (HAVE_CLONE3_WRAPPER). To create a process on a different cgroupv2, one can use the: posix_spawnattr_t attr; posix_spawnattr_init (&attr); posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP); posix_spawnattr_setcgroup_np (&attr, cgroup); posix_spawn (...) Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control whether the cgroup file descriptor will be used or not with clone3. There is no fallback if either clone3 does not support the flag or if the architecture does not provide the clone3 wrapper, in this case posix_spawn returns EOPNOTSUPP. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparcAdhemerval Zanella Netto2023-09-055-0/+40
| | | | | | Not all architectures added clone3 syscall. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* mips: Add the clone3 wrapperAdhemerval Zanella Netto2023-09-052-0/+141
| | | | | | | | | | It follows the internal signature: extern int clone3 (struct clone_args *__cl_args, size_t __size, int (*__func) (void *__arg), void *__arg); Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and mipsel-linux-gnu.