about summary refs log tree commit diff
diff options
context:
space:
mode:
authorStefan Liebler <stli@linux.ibm.com>2020-10-26 10:51:28 +0100
committerStefan Liebler <stli@linux.ibm.com>2020-10-26 10:51:28 +0100
commit9a29f1a2ae3d4bb253ee368e0d71db0ca9494120 (patch)
tree8149a83d3d25474a3120cde25c1d0a82f4e64f62
parentdac8713629c8736a60aebec2f01657e46baa4c73 (diff)
downloadglibc-9a29f1a2ae3d4bb253ee368e0d71db0ca9494120.tar.gz
glibc-9a29f1a2ae3d4bb253ee368e0d71db0ca9494120.tar.xz
glibc-9a29f1a2ae3d4bb253ee368e0d71db0ca9494120.zip
Remove timing related checks of time/tst-cpuclock1
Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
"Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
this test fails quite often on s390x/s390 with one/multiple of those:
"before - after" / "nanosleep time" / "dead - after" ourside reasonable range.

On a zVM/kvm guest the CPUs are shared between multiple guests.
And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
The defined CPUs for a lpar/zVM-system could also have lower weights compared
to other lpars which let the steal time further grow.

Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
During this time, I've run the test for 13500 times and obvserved the following fails:
~600x "before - after"
~60x "nanosleep time"
~70x "dead - after"

I've also observed a lot of "before - after" fails on a intel kvm-guest while
building/testing glibc on it.

The mentioned commit has tighten the limits of valid tv_nsec ranges:
"before - after" (expected: 500000000):
- 100000000 ... 600000000
+ 450000000 ... 550000000

"nanosleep time" (expected: 100000000):
- 100000000 ... 200000000
+ 090000000 ... 120000000

"dead - after" (expected: 100000000):
-           ... 200000000
+ 090000000 ... 120000000

The test itself forks a child process which chew_cpu (user- and kernel-space).
The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
diff = after - before
With much workload on the machine, the child won't make much progess
and it can fall much beyond the minimum limit.

Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
diff = afterns - after
The test currently also allows 0.9 * 0.1s which would be an error.
Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.

For "dead - after", the parent process kills the child process and waits long
enough to let the child finish dying. Then it gets the time of the child:
diff = dead - after
Note that diff also contains the time for the previous clock_nanosleep.
Thus you'll often see both fails at the same time.

After discussion on the mailing list, we've decided to keep the functional
checks for the clock* functions and remove the timing related checks as those
are prone to false positives.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
-rw-r--r--time/tst-cpuclock1.c65
1 files changed, 6 insertions, 59 deletions
diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
index 1ac611a92b..f40b590111 100644
--- a/time/tst-cpuclock1.c
+++ b/time/tst-cpuclock1.c
@@ -26,7 +26,6 @@
 #include <signal.h>
 #include <stdint.h>
 #include <sys/wait.h>
-#include <support/timespec.h>
 
 /* This function is intended to rack up both user and system time.  */
 static void
@@ -125,7 +124,7 @@ do_test (void)
 	  child, (unsigned long int) child_clock,
 	  (uintmax_t) res.tv_sec, (uintmax_t) res.tv_nsec);
 
-  struct timespec before, after;
+  struct timespec before;
   if (clock_gettime (child_clock, &before) < 0)
     {
       printf ("clock_gettime on live PID %d clock %lx => %s\n",
@@ -137,38 +136,7 @@ do_test (void)
   printf ("live PID %d before sleep => %ju.%.9ju\n",
 	  child, (uintmax_t) before.tv_sec, (uintmax_t) before.tv_nsec);
 
-  struct timespec sleeptime = { .tv_nsec = 500000000 };
-  if (nanosleep (&sleeptime, NULL) != 0)
-    {
-      perror ("nanosleep");
-      result = 1;
-      goto done;
-    }
-
-  if (clock_gettime (child_clock, &after) < 0)
-    {
-      printf ("clock_gettime on live PID %d clock %lx => %s\n",
-	      child, (unsigned long int) child_clock, strerror (errno));
-      result = 1;
-      goto done;
-    }
-  /* Should be close to 0.5.  */
-  printf ("live PID %d after sleep => %ju.%.9ju\n",
-	  child, (uintmax_t) after.tv_sec, (uintmax_t) after.tv_nsec);
-
-  /* The bound values are empirically defined by testing this code over high cpu
-     usage and different nice values. Of all the values we keep the 90th
-     percentile of values and use those values for our testing allowed range.  */
-  struct timespec diff = timespec_sub (support_timespec_normalize (after),
-				       support_timespec_normalize (before));
-  if (!support_timespec_check_in_range (sleeptime, diff, .9,  1.1))
-    {
-      printf ("before - after %ju.%.9ju outside reasonable range\n",
-	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-      result = 1;
-    }
-
-  sleeptime.tv_nsec = 100000000;
+  struct timespec sleeptime = { .tv_nsec = 100000000 };
   e = clock_nanosleep (child_clock, 0, &sleeptime, NULL);
   if (e == EINVAL || e == ENOTSUP || e == ENOSYS)
     {
@@ -191,18 +159,9 @@ do_test (void)
 	}
       else
 	{
-        /* The bound values are empirically defined by testing this code over
-           high cpu usage and different nice values. Of all the values we keep
-           the 90th percentile of values and use those values for our testing
-           allowed range.  */
-	  diff = timespec_sub (support_timespec_normalize (afterns),
-			       support_timespec_normalize (after));
-	  if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2))
-	    {
-	      printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
-		      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-	      result = 1;
-	    }
+	  printf ("live PID %d after sleep => %ju.%.9ju\n",
+		  child, (uintmax_t) afterns.tv_sec,
+		  (uintmax_t) afterns.tv_nsec);
 	}
     }
 
@@ -231,21 +190,9 @@ do_test (void)
       result = 1;
       goto done;
     }
-  /* Should be close to 0.6.  */
+  /* Should be close to 0.1.  */
   printf ("dead PID %d => %ju.%.9ju\n",
 	  child, (uintmax_t) dead.tv_sec, (uintmax_t) dead.tv_nsec);
-  /* The bound values are empirically defined by testing this code over high cpu
-     usage and different nice values. Of all the values we keep the 90th
-     percentile of values and use those values for our testing allowed range.  */
-  diff = timespec_sub (support_timespec_normalize (dead),
-		       support_timespec_normalize (after));
-  sleeptime.tv_nsec = 100000000;
-  if (!support_timespec_check_in_range (sleeptime, diff, .9, 1.2))
-    {
-      printf ("dead - after %ju.%.9ju outside reasonable range\n",
-	      (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-      result = 1;
-    }
 
   /* Now reap the child and verify that its clock is no longer valid.  */
   {