Randomness Testing Guide

Word Frequency Test

Let's say we have the following binary string.

s=0101010100110011000111010101010011001100011100001111s = \color{lightcoral}{01010101}\color{lightgreen}{00110011}\color{lightblue}{000111}\color{lightcoral}{01010101}\color{lightgreen}{00110011}\color{lightblue}{000111}\color{orchid}{00001111}

It is obviously not random because it is a concatenation of predetermined bit sequences. Therefore, we must check that the same groups of bits are not repeated too often.

We can use the frequency of "words" to determine if certain bit groups occur too often. There are four possible 2-bit words: 00, 01, 10, and 11. If we assign A, B, C, and D to these words, respectively, then the sequence above transforms into the following sequence.

s=BBBBADADABDBBBBADADABDAADDs = \color{lightcoral}{BBBB}\color{lightgreen}{ADAD}\color{lightblue}{ABD}\color{lightcoral}{BBBB}\color{lightgreen}{ADAD}\color{lightblue}{ABD}\color{orchid}{AADD}

Now we can run a frequency test on this string with four categories.

χ32=(86.5)26.5+(106.5)26.5+(06.5)26.5+(86.5)26.59.0769\chi^{2}_{3} = \frac{(8 - 6.5)^2}{6.5} + \frac{(10 - 6.5)^2}{6.5} + \frac{(0 - 6.5)^2}{6.5} + \frac{(8 - 6.5)^2}{6.5} \approx 9.0769

The p-value for a chi-squared of 9.0769 with 3 degrees of freedom is 0.0000, which is less than 0.01, indicating that this sequence of numbers was probably not randomly generated.

Expected Runs Distribution0101010100110011000111010101010011001100011100001111

This sequence isn't quite long enough to see that it's not random, but a longer version of the same sequence does fail the test.

Expected Runs Distribution01010101001100110001110101010100110011000111000011110101010100110011000111010101010011001100011100001111

This test, known as the word frequency test, is a significantly more difficult to pass than the runs test. Instead of concatenating predetermined strings, we must use a mathematical process that has seemingly random behavior. One very simple off-the-shelf method for doing this is using a linear congruential generator, which is defined by the following formula.

Xn+1=(aXn+c)modmX_{n+1} = (aX_n + c) \bmod m

In this formula, XnX_n is the current value in the sequence, aa is the multiplier, cc is the increment, and mm is the modulus. X0X_0 is a seed that can be arbitrarily chosen. The word frequency test fails to detect output from a linear congruential generator.

Linear Congruential Generator0011011100101101101110110101110100101010101101101011010010101111101111100101100010001011100111000000

Notes

The word frequency test can give very different results depending on the block size chosen. Larger block sizes don't always result in smaller p-values.

Word Frequency Test (Block Size = 2)01010101001100110001110101010100110011000111000011110101010100110011000111010101010011001100011100001111
Word Frequency Test (Block Size = 3)01010101001100110001110101010100110011000111000011110101010100110011000111010101010011001100011100001111
Word Frequency Test (Block Size = 4)01010101001100110001110101010100110011000111000011110101010100110011000111010101010011001100011100001111