about summary refs log tree commit diff
path: root/sysdeps/unix/sysv/linux/Makefile
Commit message (Collapse)AuthorAgeFilesLines
* linux: Make fdopendir fail with O_PATH (BZ 30373)Adhemerval Zanella2023-11-301-0/+1
| | | | | | | | | It is not strictly required by the POSIX, since O_PATH is a Linux extension, but it is QoI to fail early instead of at readdir. Also the check is free, since fdopendir already checks if the file descriptor is opened for read. Checked on x86_64-linux-gnu.
* linux: Add PR_SET_VMA_ANON_NAME supportAdhemerval Zanella2023-11-071-0/+1
| | | | | | | | | | | | | | Linux 5.17 added support to naming anonymous virtual memory areas through the prctl syscall. The __set_vma_name is a wrapper to avoid optimizing the prctl call if the kernel does not support it. If the kernel does not support PR_SET_VMA_ANON_NAME, prctl returns EINVAL. And it also returns the same error for an invalid argument. Since it is an internal-only API, it assumes well-formatted input: aligned START, with (START, START+LEN) being a valid memory range, and NAME with a limit of 80 characters without an invalid one ("\\`$[]"). Reviewed-by: DJ Delorie <dj@redhat.com>
* linux: Add pidfd_getpidAdhemerval Zanella Netto2023-09-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This interface allows to obtain the associated process ID from the process file descriptor. It is done by parsing the procps fdinfo information. Its prototype is: pid_t pidfd_getpid (int fd) It returns the associated pid or -1 in case of an error and sets the errno accordingly. The possible errno values are those from open, read, and close (used on procps parsing), along with: - EBADF if the FD is negative, does not have a PID associated, or if the fdinfo fields contain a value larger than pid_t. - EREMOTE if the PID is in a separate namespace. - ESRCH if the process is already terminated. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)Adhemerval Zanella Netto2023-09-051-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Returning a pidfd allows a process to keep a race-free handle for a child process, otherwise, the caller will need to either use pidfd_open (which still might be subject to TOCTOU) or keep the old racy interface base on pid_t. To correct use pifd_spawn, the kernel must support not only returning the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4). If kernel does not support the waitid, pidfd return ENOSYS. It avoids the need to racy workarounds, such as reading the procfs fdinfo to get the pid to use along with other wait interfaces. These interfaces are similar to the posix_spawn and posix_spawnp, with the only difference being it returns a process file descriptor (int) instead of a process ID (pid_t). Their prototypes are: int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); A new symbol is used instead of a posix_spawn extension to avoid possible issues with language bindings that might track the return argument lifetime. Although on Linux pid_t and int are interchangeable, POSIX only states that pid_t should be a signed integer. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It also means that both interfaces support the same attribute and file actions, and a new flag or file action on posix_spawn is also added automatically for pidfd_spawn. Also, using posix_spawn plumbing allows the reusing of most of the current testing with some changes: - waitid is used instead of waitpid since it is a more generic interface. - tst-posix_spawn-setsid.c is adapted to take into consideration that the caller can check for session id directly. The test now spawns itself and writes the session id as a file instead. - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an extra file description unused. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26371)Adhemerval Zanella Netto2023-09-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions allow to posix_spawn and posix_spawnp to use CLONE_INTO_CGROUP with clone3, allowing the child process to be created in a different cgroup version 2. These are GNU extensions that are available only for Linux, and also only for the architectures that implement clone3 wrapper (HAVE_CLONE3_WRAPPER). To create a process on a different cgroupv2, one can use the: posix_spawnattr_t attr; posix_spawnattr_init (&attr); posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP); posix_spawnattr_setcgroup_np (&attr, cgroup); posix_spawn (...) Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control whether the cgroup file descriptor will be used or not with clone3. There is no fallback if either clone3 does not support the flag or if the architecture does not provide the clone3 wrapper, in this case posix_spawn returns EOPNOTSUPP. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Exclude routines from fortificationFrédéric Bérat2023-07-051-0/+3
| | | | | | | | | | | | | | | | | Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to be excluded from the fortification. On top of that: - some tests explicitly verify that some level of fortification works appropriately, we therefore shouldn't modify the level set for them. - some objects need to be build with optimization disabled, which prevents _FORTIFY_SOURCE to be used for them. Assembler files that implement architecture specific versions of the fortified routines were not excluded from _FORTIFY_SOURCE as there is no C header included that would impact their behavior. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Split tst-ttynameAdhemerval Zanella2023-06-281-1/+2
| | | | | | | | | | The tst-ttyname-direct.c checks the ttyname with procfs mounted in bind mode (MS_BIND|MS_REC), while tst-ttyname-namespace.c checks with procfs mount with MS_NOSUID|MS_NOEXEC|MS_NODEV in a new namespace. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Reformat Makefile.Carlos O'Donell2023-05-161-6/+7
| | | | | | | | Reflow Makefile. Sort using scripts/sort-makefile-lines.py. No code generation changes observed in binary artifacts. No regressions on x86_64 and i686.
* hurd 64bit: Fix struct msqid_ds and shmid_ds fieldsSamuel Thibault2023-05-011-2/+0
| | | | | | | | The standards want msg_lspid/msg_lrpid/shm_cpid/shm_lpid to be pid_t, see BZ 23083 and 23085. We can leave them __rpc_pid_t on i386 for ABI compatibility, but avoid hitting the issue on 64bit.
* hurd 64bit: Fix ipc_perm fields typesSamuel Thibault2023-05-011-1/+0
| | | | | | | | | | | The standards want uid/cuid to be uid_t, gid/cgid to be gid_t and mode to be mode_t, see BZ 23082. We can leave them short ints on i386 for ABI compatibility, but avoid hitting the issue on 64bit. bits/ipc.h ends up being exactly the same in sysdeps/gnu/ and sysdeps/unix/sysv/linux/, so remove the latter.
* __check_pf: Add a cancellation cleanup handler [BZ #20975]H.J. Lu2023-04-281-0/+2
| | | | | | | | | | | | | | | | | | | | There are reports for hang in __check_pf: https://github.com/JoeDog/siege/issues/4 It is reproducible only under specific configurations: 1. Large number of cores (>= 64) and large number of threads (> 3X of the number of cores) with long lived socket connection. 2. Low power (frequency) mode. 3. Power management is enabled. While holding lock, __check_pf calls make_request which calls __sendto and __recvmsg. Since __sendto and __recvmsg are cancellation points, lock held by __check_pf won't be released and can cause deadlock when thread cancellation happens in __sendto or __recvmsg. Add a cancellation cleanup handler for __check_pf to unlock the lock when cancelled by another thread. This fixes BZ #20975 and the siege hang issue.
* linux: Re-flow and sort multiline Makefile definitionsAdhemerval Zanella2023-04-201-48/+158
|
* Remove --enable-tunables configure optionAdhemerval Zanella Netto2023-03-291-3/+1
| | | | | | | | | | | | And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawnAdhemerval Zanella Netto2023-02-011-2/+1
| | | | | | | | | | | | | | | | | The clone3 flag resets all signal handlers of the child not set to SIG_IGN to SIG_DFL. It allows to skip most of the sigaction calls to setup child signal handling, where previously a posix_spawn had to issue 2 times NSIG sigaction calls (one to obtain the current disposition and another to set either SIG_DFL or SIG_IGN). With POSIX_SPAWN_SETSIGDEF the child will setup the signal for the case where the disposition is SIG_IGN. The code must handle the fallback where clone3 is not available. This is done by splitting __clone_internal_fallback from __clone_internal. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Remove epoll_create, inotify_init from syscalls.listFlorian Weimer2022-12-191-0/+2
| | | | | | | | | | | Their presence causes stub warnings to be created on architectures which do not implement them. Fixes commit d1d23b134244d59c4d6ef2295 ("Lninux: consolidate epoll_create implementation") and commit 842128f160a48e5545900ea3b ("Linux: consolidate inotify_init implementation"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Linux: Reflow and sort some Makefile variablesFlorian Weimer2022-12-191-63/+155
| | | | Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* linux: Fix sys/mount.h usage with kernel headersAdhemerval Zanella2022-08-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that kernel exports linux/mount.h and includes it on linux/fs.h, its definitions might clash with glibc exports sys/mount.h. To avoid the need to rearrange the Linux header to be always after glibc one, the glibc sys/mount.h is changed to: 1. Undefine the macros also used as enum constants. This covers prior inclusion of <linux/mount.h> (for instance MS_RDONLY). 2. Include <linux/mount.h> based on the usual __has_include check (needs to use __has_include ("linux/mount.h") to paper over GCC bugs. 3. Define enum fsconfig_command only if FSOPEN_CLOEXEC is not defined. (FSOPEN_CLOEXEC should be a very close proxy.) 4. Define struct mount_attr if MOUNT_ATTR_SIZE_VER0 is not defined. (Added in the same commit on the Linux side.) This patch also adds some tests to check if including linux/fs.h and linux/mount.h after and before sys/mount.h does work. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Remove ldd libc4 supportAdhemerval Zanella2022-08-041-2/+0
| | | | The older libc versions are obsolete for over twenty years now.
* Linux: dirent/tst-readdir64-compat needs to use TEST_COMPAT (bug 27654)Florian Weimer2022-07-251-6/+4
| | | | | | | | | | | The hppa port starts libc at GLIBC_2.2, but has earlier symbol versions in other shared objects. This means that the compat symbol for readdir64 is not actually present in libc even though have-GLIBC_2.1.3 is defined as yes at the make level. Fixes commit 15e50e6c966fa0f26612602a95f0129543d9f9d5 ("Linux: dirent/tst-readdir64-compat can be a regular test") by mostly reverting it.
* linux: Add tst-mount to check for Linux new mount APIAdhemerval Zanella2022-07-051-0/+1
| | | | | | | | | | | | The new mount API was added on Linux 5.2 with six new syscalls: fsopen, fsconfig, fsmount, move_mount, fspick, and open_tree. The new test verifies minimal functionality along with error paths for specific arguments and their corner cases. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add fsopenAdhemerval Zanella2022-06-241-0/+8
| | | | | | | | | It was added on Linux 5.2 (24dcb3d90a1f67fe08c68a004af37df059d74005) to start the process of preparing to create a superblock that will then be mountable, using an fd as a context handle. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add process_mreleaseAdhemerval Zanella2022-06-021-0/+1
| | | | | | | | | Added in Linux 5.15 (884a7e5964e06ed93c7771c0d7cf19c09a8946f1), the new syscalls allows a caller to free the memory of a dying target process. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add process_madviseAdhemerval Zanella2022-06-021-0/+5
| | | | | | | | | | It was added on Linux 5.10 (ecb8ac8b1f146915aa6b96449b66dd48984caacc) with the same functionality as madvise but using a pidfd of the target process. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add tst-pidfd.cAdhemerval Zanella2022-05-171-0/+1
| | | | | | | | | | To check for the pidfd functions pidfd_open, pidfd_getfd, pid_send_signal, and waitid with P_PIDFD. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add pidfd_openAdhemerval Zanella2022-05-171-1/+10
| | | | | | | | | This was added on Linux 5.3 (32fcb426ec001cb6d5a4a195091a8486ea77e2df) as a way to retrieve a pid file descriptors for process that has not been created CLONE_PIDFD (by usual fork/clone). Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
* linux: Add a getauxval test [BZ #23293]Szabolcs Nagy2022-05-171-0/+1
| | | | | | | | This is for bug 23293 and it relies on the glibc test system running tests via explicit ld.so invokation by default. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* build: Properly generate .d dependency files [BZ #28922]H.J. Lu2022-02-251-0/+3
| | | | | | | | | | | | | 1. Also generate .d dependency files for $(tests-container) and $(tests-printers). 2. elf: Add tst-auditmod17.os to extra-test-objs. 3. iconv: Add tst-gconv-init-failure-mod.os to extra-test-objs. 4. malloc: Rename extra-tests-objs to extra-test-objs. 5. linux: Add tst-sysconf-iov_max-uapi.o to extra-test-objs. 6. x86_64: Add tst-x86_64mod-1.o, tst-platformmod-2.o, test-libmvec.o, test-libmvec-avx.o, test-libmvec-avx2.o and test-libmvec-avx512f.o to extra-test-objs. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Linux: Only generate 64 bit timestamps for 64 bit time_t recvmsg/recvmmsgAdhemerval Zanella2022-01-281-2/+8
| | | | | | | | | | | | | | | | | | | | | | | The timestamps created by __convert_scm_timestamps only make sense for 64 bit time_t programs, 32 bit time_t programs will ignore 64 bit time_t timestamps since SO_TIMESTAMP will be defined to old values (either by glibc or kernel headers). Worse, if the buffer is not suffice MSG_CTRUNC is set to indicate it (which breaks some programs [1]). This patch makes only 64 bit time_t recvmsg and recvmmsg to call __convert_scm_timestamps. Also, the assumption to called it is changed from __ASSUME_TIME64_SYSCALLS to __TIMESIZE != 64 since the setsockopt might be called by libraries built without __TIME_BITS=64. The MSG_CTRUNC is only set for the 64 bit symbols, it should happen only if 64 bit time_t programs run older kernels. Checked on x86_64-linux-gnu and i686-linux-gnu. [1] https://github.com/systemd/systemd/pull/20567 Reviewed-by: Florian Weimer <fweimer@redhat.com>
* linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349, BZ#28350)Adhemerval Zanella2022-01-281-0/+3
| | | | | | | | | | | | | | | | | | | | The __convert_scm_timestamps only updates the control message last pointer for SOL_SOCKET type, so if the message control buffer contains multiple ancillary message types the converted timestamp one might overwrite a valid message. The test checks if the extra ancillary space is correctly handled by recvmsg/recvmmsg, where if there is no extra space for the 64-bit time_t converted message the control buffer should be marked with MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple ancillary data. Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and 4.15 kernel. Co-authored-by: Fabian Vogt <fvogt@suse.de> Reviewed-by: Florian Weimer <fweimer@redhat.com>
* getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999)Siddhesh Poyarekar2022-01-241-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No valid path returned by getcwd would fit into 1 byte, so reject the size early and return NULL with errno set to ERANGE. This change is prompted by CVE-2021-3999, which describes a single byte buffer underflow and overflow when all of the following conditions are met: - The buffer size (i.e. the second argument of getcwd) is 1 byte - The current working directory is too long - '/' is also mounted on the current working directory Sequence of events: - In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG because the linux kernel checks for name length before it checks buffer size - The code falls back to the generic getcwd in sysdeps/posix - In the generic func, the buf[0] is set to '\0' on line 250 - this while loop on line 262 is bypassed: while (!(thisdev == rootdev && thisino == rootino)) since the rootfs (/) is bind mounted onto the directory and the flow goes on to line 449, where it puts a '/' in the byte before the buffer. - Finally on line 458, it moves 2 bytes (the underflowed byte and the '\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow. - buf is returned on line 469 and errno is not set. This resolves BZ #28769. Reviewed-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* Linux: Add epoll_pwait2 (BZ #27359)Adhemerval Zanella2022-01-171-1/+3
| | | | | | | | | | | | | It is similar to epoll_wait, with the difference the timeout has nanosecond resoluting by using struct timespec instead of int. Although Linux interface only provides 64 bit time_t support, old 32 bit interface is also provided (so keep in sync with current practice and to no force opt-in on 64 bit time_t). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
* Revert "linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349, BZ ↵Adhemerval Zanella2022-01-121-3/+0
| | | | | | #28350)" This reverts commit 21e0f45c7d73df6fe30c77ffcc9f81410e2ee369.
* linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349, BZ #28350)Adhemerval Zanella2022-01-121-0/+3
| | | | | | | | | | | | | | | | | | The __convert_scm_timestamps() only updates the control message last pointer for SOL_SOCKET type, so if the message control buffer contains multiple ancillary message types the converted timestamp one might overwrite a valid message. The test check if the extra ancillary space is correctly handled by recvmsg/recvmmsg, where if there is no extra space for the 64-bit time_t converted message the control buffer should be marked with MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple ancillary data. Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and 4.15 kernel. Co-authored-by: Fabian Vogt <fvogt@suse.de>
* nptl: Add public rseq symbols and <sys/rseq.h>Florian Weimer2021-12-091-1/+2
| | | | | | | | | | | | | The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* nptl: Add glibc.pthread.rseq tunable to control rseq registrationFlorian Weimer2021-12-091-0/+8
| | | | | | | | This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* nptl: Add rseq registrationFlorian Weimer2021-12-091-1/+8
| | | | | | | | | | | | The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
* linux: Implement mremap in CAdhemerval Zanella2021-11-301-0/+1
| | | | | | | | | Variadic function calls in syscalls.list does not work for all ABIs (for instance where the argument are passed on the stack instead of registers) and might have underlying issues depending of the variadic type (for instance if a 64-bit argument is used). Checked on x86_64-linux-gnu.
* linux: Add prlimit64 C implementationAdhemerval Zanella2021-11-301-1/+1
| | | | | | | | | | | The LFS prlimit64 requires a arch-specific implementation in syscalls.list. Instead add a generic one that handles the required symbol alias for __RLIM_T_MATCHES_RLIM64_T. HPPA is the only outlier which requires a different default symbol. Checked on x86_64-linux-gnu and with build for the affected ABIs.
* linux: Add fanotify_mark C implementationAdhemerval Zanella2021-11-251-1/+2
| | | | | | | | | Passing 64-bit arguments on syscalls.list is tricky: it requires to reimplement the expected kernel abi in each architecture. This is way to better to represent in C code where we already have macros for this (SYSCALL_LL64). Checked on x86_64-linux-gnu.
* io: Refactor close_range and closefromAdhemerval Zanella2021-11-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that Hurd implementis both close_range and closefrom (f2c996597d), we can make close_range() a base ABI, and make the default closefrom() implementation on top of close_range(). The generic closefrom() implementation based on __getdtablesize() is moved to generic close_range(). On Linux it will be overriden by the auto-generation syscall while on Hurd it will be a system specific implementation. The closefrom() now calls close_range() and __closefrom_fallback(). Since on Hurd close_range() does not fail, __closefrom_fallback() is an empty static inline function set by__ASSUME_CLOSE_RANGE. The __ASSUME_CLOSE_RANGE also allows optimize Linux __closefrom_fallback() implementation when --enable-kernel=5.9 or higher is used. Finally the Linux specific tst-close_range.c is moved to io and enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are guarded so it can be built for Hurd (I have not actually test it). Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu build.
* socket: Add time64 alias for sendmmsgFlorian Weimer2021-07-211-0/+2
| | | | | Reviewed-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Linux: Add time64 alias for prctlFlorian Weimer2021-07-211-1/+5
| | | | | Reviewed-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Add static tests for __clone_internalH.J. Lu2021-07-141-0/+9
| | | | Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Add an internal wrapper for clone, clone2 and clone3H.J. Lu2021-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clone3 system call (since Linux 5.3) provides a superset of the functionality of clone and clone2. It also provides a number of API improvements, including the ability to specify the size of the child's stack area which can be used by kernel to compute the shadow stack size when allocating the shadow stack. Add: extern int __clone_internal (struct clone_args *__cl_args, int (*__func) (void *__arg), void *__arg); to provide an abstract interface for clone, clone2 and clone3. 1. Simplify stack management for thread creation by passing both stack base and size to create_thread. 2. Consolidate clone vs clone2 differences into a single file. 3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns -1 with ENOSYS, fall back to clone or clone2. 4. Use only __clone_internal to clone a thread. Since the stack size argument for create_thread is now unconditional, always pass stack size to create_thread. 5. Enable the public clone3 wrapper in the future after it has been added to all targets. NB: Sandbox will return ENOSYS on clone3 in both Chromium: The following revision refers to this bug: https://chromium.googlesource.com/chromium/src/+/218438259dd795456f0a48f67cbe5b4e520db88b commit 218438259dd795456f0a48f67cbe5b4e520db88b Author: Matthew Denton <mpdenton@chromium.org> Date: Thu Jun 03 20:06:13 2021 Linux sandbox: return ENOSYS for clone3 Because clone3 uses a pointer argument rather than a flags argument, we cannot examine the contents with seccomp, which is essential to preventing sandboxed processes from starting other processes. So, we won't be able to support clone3 in Chromium. This CL modifies the BPF policy to return ENOSYS for clone3 so glibc always uses the fallback to clone. Bug: 1213452 Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184 Reviewed-by: Robert Sesek <rsesek@chromium.org> Commit-Queue: Matthew Denton <mpdenton@chromium.org> Cr-Commit-Position: refs/heads/master@{#888980} [modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc and Firefox: https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Linux: Use 32-bit vDSO for clock_gettime, gettimeofday, time (BZ# 28071)Adhemerval Zanella2021-07-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | The previous approach defeats the vDSO optimization on older kernels because a failing clock_gettime64 system call is performed on every function call. It also results in a clobbered errno value, exposing an OpenJDK bug (JDK-8270244). This patch fixes by open-code INLINE_VSYSCALL macro and replace all INLINE_SYSCALL_CALL with INTERNAL_SYSCALL_CALLS. Now for __clock_gettime64x, the 64-bit vDSO is used and the 32-bit vDSO is tried before falling back to 64-bit syscalls. The previous code preferred 64-bit syscall for the case where the kernel provides 64-bit time_t syscalls *and* also a 32-bit vDSO (in this case the *64-bit* syscall should be preferable over the vDSO). All architectures that provides 32-bit vDSO (i386, mips, powerpc, s390) modulo sparc; but I am not sure if some kernels versions do provide only 32-bit vDSO while still providing 64-bit time_t syscall. Regardless, for such cases the 64-bit time_t syscall is used if the vDSO returns overflowed 32-bit time_t. Tested on i686-linux-gnu (with a time64 and non-time64 kernel), x86_64-linux-gnu. Built with build-many-glibcs.py. Co-authored-by: Florian Weimer <fweimer@redhat.com>
* Reduce <limits.h> pollution due to dynamic PTHREAD_STACK_MINFlorian Weimer2021-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | <limits.h> used to be a header file with no declarations. GCC's libgomp includes it in a #pragma GCC visibility hidden block. Including <unistd.h> from <limits.h> (indirectly) declares everything in <unistd.h> with hidden visibility, resulting in linker failures. This commit avoids C declarations in assembler mode and only declares __sysconf in <limits.h> (and not the entire contents of <unistd.h>). The __sysconf symbol is already part of the ABI. PTHREAD_STACK_MIN is no longer defined for __USE_DYNAMIC_STACK_SIZE && __ASSEMBLER__ because there is no possible definition. Additionally, PTHREAD_STACK_MIN is now defined by <pthread.h> for __USE_MISC because this is what developers expect based on the macro name. It also helps to avoid libgomp linker failures in GCC because libgomp includes <pthread.h> before its visibility hacks. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* Define PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN)H.J. Lu2021-07-091-1/+2
| | | | | | | | | | | | | The constant PTHREAD_STACK_MIN may be too small for some processors. Rename _SC_SIGSTKSZ_SOURCE to _DYNAMIC_STACK_SIZE_SOURCE. When _DYNAMIC_STACK_SIZE_SOURCE or _GNU_SOURCE are defined, define PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN) which is changed to MIN (PTHREAD_STACK_MIN, sysconf(_SC_MINSIGSTKSZ)). Consolidate <bits/local_lim.h> with <bits/pthread_stack_min.h> to provide a constant target specific PTHREAD_STACK_MIN value. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* io: Add closefrom [BZ #10353]Adhemerval Zanella2021-07-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function closes all open file descriptors greater than or equal to input argument. Negative values are clamped to 0, i.e, it will close all file descriptors. As indicated by the bug report, this is a common symbol provided by different systems (Solaris, OpenBSD, NetBSD, FreeBSD) and, although its has inherent issues with not taking in consideration internal libc file descriptors (such as syslog), this is also a common feature used in multiple projects [1][2][3][4][5]. The Linux fallback implementation iterates over /proc and close all file descriptors sequentially. Although it was raised the questioning whether getdents on /proc/self/fd might return disjointed entries when file descriptor are closed; it does not seems the case on my testing on multiple kernel (v4.18, v5.4, v5.9) and the same strategy is used on different projects [1][2][3][5]. Also, the interface is set a fail-safe meaning that a failure in the fallback results in a process abort. Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15. [1] https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/fd-util.c#L217 [2] https://github.com/lxc/lxc/blob/ddf4b77e11a4d08f09b7b9cd13e593f8c047edc5/src/lxc/start.c#L236 [3] https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d867f8/Modules/_posixsubprocess.c#L220 [4] https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5e451/src/libstd/sys/unix/process2.rs#L303-L308 [5] https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L82
* linux: Add close_rangeAdhemerval Zanella2021-07-081-1/+2
| | | | | | | | | | It was added on Linux 5.9 (278a5fbaed89) with CLOSE_RANGE_CLOEXEC added on 5.11 (582f1fb6b721f). Although FreeBSD has added the same syscall, this only adds the symbol on Linux ports. This syscall is required to provided a fail-safe way to implement the closefrom symbol (BZ #10353). Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.
* Linux: Cleanups after librt moveFlorian Weimer2021-06-281-13/+0
| | | | | | | | | | | | librt.so is no longer installed for PTHREAD_IN_LIBC, and tests are not linked against it. $(librt) is introduced globally for shared tests that need to be linked for both PTHREAD_IN_LIBC and !PTHREAD_IN_LIBC. GLIBC_PRIVATE symbols that were needed during the transition are removed again. Reviewed-by: Carlos O'Donell <carlos@redhat.com>