Runs Test

Let's say we have the following binary string.

s = 01010101010101010101

It is obviously not random since each bit is simply the opposite of the previous one. Therefore, we must check that there is a reasonable number of transitions between zeros and ones.

If we start at the beginning of a bit string, what is the expected number of bits we will encounter before transitioning to the opposite bit? There is a 50% chance we transition in 1 step, a 25% chance we transition in 2 steps, a 12.5% chance we transition in 3 steps, etc.

This means that for a string of length $n$ , 50% of the total number of runs will have length 1, 25% of the total number of runs will have length 2, 12.5% of the total number of runs will have length 3, etc. These percentages tell us that the expected average run length is 2. We can use a $\chi^{2}$ test to determine if the observed distribution of run lengths matches this expected distribution.

For the binary string $s$ above, every single bit is different from the previous one, so there are 20 total runs in the string, each of length 1. The expected total number of runs is the length of the string divided by the expected average run length, which is 20 / 2 = 10. Therefore, we can calculate the $\chi^{2}$ statistic with three degrees of freedom as follows.

\chi^{2}_{3} = \underbrace{\frac{(20 - 5)^2}{5}}_{\text{runs of length 1}} + \underbrace{\frac{(0 - 2.5)^2}{2.5}}_{\text{runs of length 2}} + \underbrace{\frac{(0 - 1.25)^2}{1.25}}_{\text{runs of length 3}} + \underbrace{\frac{(0 - 1.25)^2}{1.25}}_{\text{runs of length ≥4}} = 50

The p-value for a chi-squared of 50 with 3 degrees of freedom is 0.0000, which is less than 0.01, indicating that this sequence of numbers was probably not randomly generated.

Alternating Zeros and Ones01010101010101010101

You might ask why we only use 4 categories here and not 20 since we should check for runs of all possible lengths. Remember that the chi-squared variable requires a large enough expected count per category (usually 5) for the central limit theorem to be applicable. In fact, for the test above, 4 categories was too many and using only 2 would have made more sense. I only used 4 categories to make the idea behind the formula more obvious.

This test, known as the runs test, is a significantly more difficult to pass than the frequency test. However, it fails on the following binary string that was obviously not randomly generated.

s = \color{lightcoral}{01010101}\color{lightgreen}{00110011}\color{lightblue}{000111}\color{lightcoral}{01010101}\color{lightgreen}{00110011}\color{lightblue}{000111}\color{orchid}{00001111}

Expected Runs Distribution0101010100110011000111010101010011001100011100001111

Notes

Each statistical test for randomness works effectively for larger binary strings due to the central limit theorem becoming more applicable with larger sample sizes. However, for smaller binary strings, the tests may exhibit higher rates of false positive and false negative errors. The runs test presented in this guide could potentially be modified for better accuracy on smaller strings. See Block Longest Run Test for more details.

← Previous

2. Block Frequency Test

4. Word Frequency Test