diff options
author | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2017-01-31 18:01:59 -0200 |
---|---|---|
committer | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2017-06-14 17:22:35 -0300 |
commit | 0edbf1230131dfeb03d843d2859e2104456fad80 (patch) | |
tree | 308321439470d11d70f6b84464d33021cf65f575 /benchtests | |
parent | 5c3e322d3be3803636e38bcaf083fb59b3a34f0c (diff) | |
download | glibc-0edbf1230131dfeb03d843d2859e2104456fad80.tar.gz glibc-0edbf1230131dfeb03d843d2859e2104456fad80.tar.xz glibc-0edbf1230131dfeb03d843d2859e2104456fad80.zip |
nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)
Current allocate_stack logic for create stacks is to first mmap all the required memory with the desirable memory and then mprotect the guard area with PROT_NONE if required. Although it works as expected, it pessimizes the allocation because it requires the kernel to actually increase commit charge (it counts against the available physical/swap memory available for the system). The only issue is to actually check this change since side-effects are really Linux specific and to actually account them it would require a kernel specific tests to parse the system wide information. On the kernel I checked /proc/self/statm does not show any meaningful difference for vmm and/or rss before and after thread creation. I could only see really meaningful information checking on system wide /proc/meminfo between thread creation: MemFree, MemAvailable, and Committed_AS shows large difference without the patch. I think trying to use these kind of information on a testcase is fragile. The BZ#18988 reports shows that the commit pages are easily seen with mlockall (MCL_FUTURE) (with lock all pages that become mapped in the process) however a more straighfoward testcase shows that pthread_create could be faster using this patch: -- static const int inner_count = 256; static const int outer_count = 128; static void *thread1(void *arg) { return NULL; } static void *sleeper(void *arg) { pthread_t ts[inner_count]; for (int i = 0; i < inner_count; i++) pthread_create (&ts[i], &a, thread1, NULL); for (int i = 0; i < inner_count; i++) pthread_join (ts[i], NULL); return NULL; } int main(void) { pthread_attr_init(&a); pthread_attr_setguardsize(&a, 1<<20); pthread_attr_setstacksize(&a, 1134592); pthread_t ts[outer_count]; for (int i = 0; i < outer_count; i++) pthread_create(&ts[i], &a, sleeper, NULL); for (int i = 0; i < outer_count; i++) pthread_join(ts[i], NULL); assert(r == 0); } return 0; } -- On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests I see: $ time ./test real 0m3.647s user 0m0.080s sys 0m11.836s While with the patch I see: $ time ./test real 0m0.696s user 0m0.040s sys 0m1.152s So I added a pthread_create benchtest (thread_create) which check the thread creation latency. As for the simple benchtests, I saw improvements in thread creation on all architectures I tested the change. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu. [BZ #18988] * benchtests/thread_create-inputs: New file. * benchtests/thread_create-source.c: Likewise. * support/xpthread_attr_setguardsize.c: Likewise. * support/Makefile (libsupport-routines): Add xpthread_attr_setguardsize object. * support/xthread.h: Add xpthread_attr_setguardsize prototype. * benchtests/Makefile (bench-pthread): Add thread_create. * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and then mprotect the required area.
Diffstat (limited to 'benchtests')
-rw-r--r-- | benchtests/Makefile | 2 | ||||
-rw-r--r-- | benchtests/thread_create-inputs | 14 | ||||
-rw-r--r-- | benchtests/thread_create-source.c | 58 |
3 files changed, 73 insertions, 1 deletions
diff --git a/benchtests/Makefile b/benchtests/Makefile index 7f5fda5ef4..1e28e87919 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \ modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \ fmaxf -bench-pthread := pthread_once +bench-pthread := pthread_once thread_create bench-string := ffs ffsll diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs new file mode 100644 index 0000000000..e3ca03b0da --- /dev/null +++ b/benchtests/thread_create-inputs @@ -0,0 +1,14 @@ +## args: int:size_t:size_t +## init: thread_create_init +## includes: pthread.h +## include-sources: thread_create-source.c + +## name: stack=1024,guard=1 +32, 1024, 1 +## name: stack=1024,guard=2 +32, 1024, 2 + +## name: stack=2048,guard=1 +32, 2048, 1 +## name: stack=2048,guard=2 +32, 2048, 2 diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c new file mode 100644 index 0000000000..035894a63b --- /dev/null +++ b/benchtests/thread_create-source.c @@ -0,0 +1,58 @@ +/* Measure pthread_create thread creation with different stack + and guard sizes. + + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdio.h> +#include <unistd.h> +#include <support/xthread.h> + +static size_t pgsize; + +static void +thread_create_init (void) +{ + pgsize = sysconf (_SC_PAGESIZE); +} + +static void * +thread_dummy (void *arg) +{ + return NULL; +} + +static void +thread_create (int nthreads, size_t stacksize, size_t guardsize) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + stacksize = stacksize * pgsize; + guardsize = guardsize * pgsize; + + xpthread_attr_setstacksize (&attr, stacksize); + xpthread_attr_setguardsize (&attr, guardsize); + + pthread_t ts[nthreads]; + + for (int i = 0; i < nthreads; i++) + ts[i] = xpthread_create (&attr, thread_dummy, NULL); + + for (int i = 0; i < nthreads; i++) + xpthread_join (ts[i]); +} |