about summary refs log tree commit diff
path: root/src
diff options
context:
space:
mode:
authorRich Felker <dalias@aerifal.cx>2013-06-12 18:20:48 -0400
committerRich Felker <dalias@aerifal.cx>2013-06-12 18:20:48 -0400
commit20d01d83b5a13c77805976e7c520f566244ba3ff (patch)
tree3cfb70bf958e5b621b70c7099be59de534068b9f /src
parent4191d2447673e6b03722198871af0daf9390cb51 (diff)
downloadmusl-20d01d83b5a13c77805976e7c520f566244ba3ff.tar.gz
musl-20d01d83b5a13c77805976e7c520f566244ba3ff.tar.xz
musl-20d01d83b5a13c77805976e7c520f566244ba3ff.zip
improve the quality of output from rand_r
due to the interface requirement of having the full state contained in
a single object of type unsigned int, it is difficult to provide a
reasonable-quality implementation; most good PRNGs are immediately
ruled out because they need larger state. the old rand_r gave very
poor output (very short period) in its lower bits; normally, it's
desirable to throw away the low bits (as in rand()) when using a LCG,
but this is not possible since the state is only 32 bits and we need
31 bits of output.

glibc's rand_r uses the same LCG as musl's, but runs it for 3
iterations and only takes 10-11 bits from each iteration to construct
the output value. this partially fixes the period issue, but
introduces bias: not all outputs have the same frequency, and many do
not appear at all. with such a low period, the bias is likely to be
observable.

I tried many approaches to "fix" rand_r, and the simplest I found
which made it pass the "dieharder" tests was applying this
transformation to the output. the "temper" function is taken from
mersenne twister, where it seems to have been chosen for some rigorous
properties; here, the only formal property I'm using is that it's
one-to-one and thus avoids introducing bias.

should further deficiencies in rand_r be reported, the obvious "best"
solution is applying a 32-bit cryptographic block cipher in CTR mode.
I identified several possible ciphers that could be used directly or
adapted, but as they would be a lot slower and larger, I do not see a
justification for using them unless the current rand_r proves
deficient for some real-world use.
Diffstat (limited to 'src')
-rw-r--r--src/prng/rand_r.c11
1 files changed, 10 insertions, 1 deletions
diff --git a/src/prng/rand_r.c b/src/prng/rand_r.c
index ef26dbde..638614c8 100644
--- a/src/prng/rand_r.c
+++ b/src/prng/rand_r.c
@@ -1,6 +1,15 @@
 #include <stdlib.h>
 
+static unsigned temper(unsigned x)
+{
+	x ^= x>>11;
+	x ^= x<<7 & 0x9D2C5680;
+	x ^= x<<15 & 0xEFC60000;
+	x ^= x>>18;
+	return x;
+}
+
 int rand_r(unsigned *seed)
 {
-	return (*seed = *seed * 1103515245 + 12345)/2;
+	return temper(*seed = *seed * 1103515245 + 12345)/2;
 }