about summary refs log tree commit diff
path: root/elf/dl-reloc.c
diff options
context:
space:
mode:
authorSzabolcs Nagy <szabolcs.nagy@arm.com>2021-02-16 12:55:13 +0000
committerSzabolcs Nagy <szabolcs.nagy@arm.com>2023-09-01 08:21:37 +0100
commitd2123d68275acc0f061e73d5f86ca504e0d5a344 (patch)
tree76013866674b9b5d8fd3df9d1f16f9dd5f6dacdb /elf/dl-reloc.c
parent1493622f4f9048ffede3fbedb64695efa49d662a (diff)
downloadglibc-d2123d68275acc0f061e73d5f86ca504e0d5a344.tar.gz
glibc-d2123d68275acc0f061e73d5f86ca504e0d5a344.tar.xz
glibc-d2123d68275acc0f061e73d5f86ca504e0d5a344.zip
elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Diffstat (limited to 'elf/dl-reloc.c')
-rw-r--r--elf/dl-reloc.c6
1 files changed, 3 insertions, 3 deletions
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index 1d558c1e0c..e5c555d82c 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -112,11 +112,11 @@ _dl_try_allocate_static_tls (struct link_map *map, bool optional)
   if (map->l_real->l_relocated)
     {
 #ifdef SHARED
+      /* Update the DTV of the current thread.  Note: GL(dl_load_tls_lock)
+	 is held here so normal load of the generation counter is valid.  */
       if (__builtin_expect (THREAD_DTV()[0].counter != GL(dl_tls_generation),
 			    0))
-	/* Update the slot information data for at least the generation of
-	   the DSO we are allocating data for.  */
-	(void) _dl_update_slotinfo (map->l_tls_modid);
+	(void) _dl_update_slotinfo (map->l_tls_modid, GL(dl_tls_generation));
 #endif
 
       dl_init_static_tls (map);