Runs Test
Let's say we have the following binary string.
It is obviously not random since each bit is simply the opposite of the previous one. Therefore, we must check that there is a reasonable number of transitions between zeros and ones.
If we start at the beginning of a bit string, what is the expected number of bits we will encounter before transitioning to the opposite bit? There is a 50% chance we transition in 1 step, a 25% chance we transition in 2 steps, a 12.5% chance we transition in 3 steps, etc.
This means that for a string of length , 50% of the total number of runs will have length 1, 25% of the total number of runs will have length 2, 12.5% of the total number of runs will have length 3, etc. These percentages tell us that the expected average run length is 2. We can use a test to determine if the observed distribution of run lengths matches this expected distribution.
For the binary string above, every single bit is different from the previous one, so there are 20 total runs in the string, each of length 1. The expected total number of runs is the length of the string divided by the expected average run length, which is 20 / 2 = 10. Therefore, we can calculate the statistic with three degrees of freedom as follows.
The p-value for a chi-squared of 50 with 3 degrees of freedom is 0.0000, which is less than 0.01, indicating that this sequence of numbers was probably not randomly generated.
You might ask why we only use 4 categories here and not 20 since we should check for runs of all possible lengths. Remember that the chi-squared variable requires a large enough expected count per category (usually 5) for the central limit theorem to be applicable. In fact, for the test above, 4 categories was too many and using only 2 would have made more sense. I only used 4 categories to make the idea behind the formula more obvious.
This test, known as the runs test, is a significantly more difficult to pass than the frequency test. However, it fails on the following binary string that was obviously not randomly generated.
Notes
Each statistical test for randomness works effectively for larger binary strings due to the central limit theorem becoming more applicable with larger sample sizes. However, for smaller binary strings, the tests may exhibit higher rates of false positive and false negative errors. The runs test presented in this guide could potentially be modified for better accuracy on smaller strings. See Block Longest Run Test for more details.