Random Number Generation
I want to know more about random number generation. It was used often when I was studying machine learning with Python, and I want to know more about its relevance.
References
Definitions
- Statistical Randomness
- The
- Mersenne Twister Algorithm
- The Mersenne Twister is a general-purpose pseudorandom number generator (PRNG) developed in 1997 by Makoto Matsumoto and Takuji Nishimura. Its name derives from the choice of a Mersenne prime as its period length.
- A Mersenne Prime is a prime number that is one less than a power of two. A prime number of the form for some integer . Examples: 3, 7, 31, 8191, 131071, 524287, ...
- The most commonly used version of the Mersenne Twister algorithm is based on the Mersenne prime .
- For a w-bit word length, the Mersenne Twister generates integers in the range .
/dev/random
- In Unix-like operating systems,
/dev/random
and/dev/urandom
are special files that serve as cryptographically secure pseudorandom number generators. They allow access to a CSPRNG that is seeded with entropy (a value that provides randomness) from environmental noise, collected from device drivers and other sources. - This special file was originated in Linux in 1994 and quickly adopted by other Unix-like operating systems.
- In Unix-like operating systems,
Notes
Random Number Generation is a process by which, often by means of a random number generator (RNG), a number r symbols that cannot be reasonably predicted better than random chance is generated. This means that the particular outcome sequence will contain some patterns detectable in hindsight but impossible to foresee. True random number generators can be hardware random-number generators (HNRGs), wherein each generation is a function of the current value of a physical environment's attitude that is constantly changing in a manner that is practically impossible to model. This would be in contrast to so-calledrandom number generationsdone by pseudorandom number generators (PRNGs), which generate numbers that only look random but are in fact predetermined = these generations can be reproduced simply by knowing the state of the PRNG.
- Methods for generating random data include coin flipping, rolling of dice, ...
- Several computational methods for pseudorandom number generation exist. All fall short of the goal of true randomness, although they meet, with varying success, some of the statistical tests for randomness intended to measure how unpredictable their results are. This generally makes them unusable for applications such as cryptography.
- Carefully designed crypto graphically secure pseudorandom number generators (CSPRNGs) exist with features specifically designed for use in cryptography.
Practical Applications and uses
- used in applications where producing an unpredictable result is desirable:
- Gambling
- Statistical Sampling
- Computer Simulation
- Cryptography
- Pseudorandom number generators are useful in developing Monte Carlo-method simulations, as debugging is facilitated by the ability to run the same sequence of random numbers again by starting from the same random seed
- The generation of pseudorandom numbers is an important and common task in computer programming.
- Cryptography requires a high degree of apparent randomness
- Weaker forms of randomness are used in hash algorithms
- A truly random system would have no restriction on the same item appearing two or three times in succession.
True vs Pseudo-random Numbers
- There are two principle methods used to generate random numbers.
- The first method measures some physical phenomenon that is expected to be random and then compensates for possible biases in the measurement process., e.g. measuring atmospheric noise, thermal noise, or other electromagnetic and quantum phenomena.
- Entropy, in terms of random numbers, is a measure of unpredictability or surprise of the number generation process.
- The speed at which entropy can be obtained from natural sources is dependent on the underling physical phenomena being measured.
- This type of generation is blocking.
- The second method uses computational algorithms that can produce long sequences of apparently random results, which are in fact completely determined by a shorter initial value, known as the seed value or key.
- As a result, the entire seemingly random sequence can be reproduced if the seed is known. This type of random number generator is often called a pseudorandom number generator.
- This type of generator typically does not rely on sources of naturally occurring entropy, though it may be periodically seeded by natural sources.
- This type of generation is non-blocking.
- Some systems take a hybrid approach, proving randomness harvested form natural sources when available, and falling back periodically to re-seeded software based cryptographically secure pseudo random number generators.
While a pseudorandom number generator based solely on deterministic logic can never be regarded as atruerandom number source in the purest sense of the word, in practice they are generally sufficient even for demanding security-critical applications.
Generation Methods
Physical Methods
- The earliest form of generating random numbers tend to be too slow for most applications in statistics and cryptography.
- A physical random number generator can be based on essentially random atomic or subatomic physical phenomenon whose unpredictability can be traced to the laws of quantum mechanics.
Computational Methods
- Most computer-generated random numbers use PRNGs which are algorithms that can automatically create long runs of numbers with good random properties but eventually the sequence repeats.
- The series of values generated by such algorithms is generally determined by a fixed number called a seed. One of the most common PRNG is the linear congruential generator, which uses the recurrence:
- to generate random numbers, where a, b, and m are large integers, and is the next X as a series of pseudo random numbers. The maximum number of numbers the formula can produce is the modulus, m.
- A simple pen and paper method for generating random umbers is the so-called middle-square method, suggested by von Neumann.
- The default random number generator used in many languages is based on the Mersenne Twister algorithm and is not sufficient for cryptography purposes.
- Such library functions often have poor statistical properties and will repeat patterns after only tens of thousands of trials.
- They are often initialized using a computer's real time clock as the seed.
- Much higher quality random number sources are available on most operating systems; for example
/dev/random
on various BSD flavors, Linux.
By Humans
- Most studies find that human subjects have some degree of non-randomness when attempting to produce a random sequence.
Post Processing and Statistical Checks
- Obtaining numbers which are completely unbiased takes care.
- Generated random numbers are sometimes subjected to statistical tests before use to ensure that the underlying source is still working, and then post-processed to improve their statistical properties.
- Statistical tests are also used to give confidence that the post-processed final output from a random number generator is truly unbiased, with various randomness test suites being developed.
Other Considerations
Reshaping the Distribution
Uniform Distributions
Most random number generators natively work with integers or individual bits, so an extra step is required to arrive at thecanonicaluniform distribution between 0 and 1. This implementation is not as trivial as dividing the integer by its maximum possible value:
- The integer used in the transformation must provide enough bits for the intended precision.
- The nature of floating-point math itself means there exists more precision the closer the number is to zero. This extra precision is usually not used due to the sheer number of bits required.
- Rounding error in division may bias the result. At worst, a supposedly excluded bound may be drawn contrary to expectations based on real-number math.
Whitening
- The outputs of multiple independent RNGs can be combined (for example, using a bit-wise XOR operation) to provide a combined RNG at least as good as the best RNG used. This is referred to as software whitening.
Activities and Demonstrations
- Sites that make available random number samples:
- SOCR Resource Page
- Quantum Optics Group at the ANU Generates Random Numbers Sourced from Quantum Vacuum
- Random.org makes available random numbers that are sourced from the randomness of atmospheric noise
Backdoors
- If a random number generator can be made predictable, it can be used as a backdoor by an attack to break the encryption.