The need for rand speed
Introduction
Very often when I answer questions on Stack Overflow I learn something new. Recently when discussing random number generation in this post I have made an answer using a practice I knew worked from my experience, but it turned out that I did not really understand why (and thanks to rafak for a great comment).
Let us start with the conclusion from the discussion and then I will expand on it:
Always explicitly pass random number generator to the
rand
function in performance-critical code.
Let us first see a simple example of this rule at work and next try to understand the reason for this recommendation.
Estimating \(\pi\) using Monte Carlo simulation
Let us write a simple function that approximates \(\pi\) using Monte Carlo simulation and uses the default global pseudo-random number generator.
function pi_global(n::Int)
s = 0
for _ in 1:n
s += rand()^2 + rand()^2 < 1
end
return 4 * s / n
end
The code takes advantage from a well known fact that if we sample a point \((x,y)\) uniformly from \([0,1,]^2\) square the probability that \(x^2+y^2\) is less than \(1\) is equal to \(\pi/4\).
We check the runtime of this code:
julia> @time pi_global(10^9)
6.998930 seconds (19 allocations: 20.188 KiB)
3.141615124
julia> @time pi_global(10^9)
7.002321 seconds
3.141527116
as you can see on my laptop it is around 7 seconds.
Now let us write a function that takes a MersenneTwister
generator (this is the default pseudo-random number generator in Julia).
using Random
function pi_local(n::Int, rng::MersenneTwister)
s = 0
for _ in 1:n
s += rand(rng)^2 + rand(rng)^2 < 1
end
return 4 * s / n
end
Here is its timing:
julia> mt = MersenneTwister();
julia> @time pi_local(10^9, mt)
2.723634 seconds
3.141526412
julia> @time pi_local(10^9, mt)
2.734530 seconds
3.141671232
Wow! I would not have expected this.
Now let me reveal that I am on Julia 1.5.3. Interestingly, when I built my
habits of working with rand
it was Julia 1.0 time. Let us check these codes on
Julia 1.0.5 (that soon will stop being supported). Here are the results:
julia> function pi_global(n::Int)
s = 0
for _ in 1:n
s += rand()^2 + rand()^2 < 1
end
return 4 * s / n
end
pi_global (generic function with 1 method)
julia> @time pi_global(10^9)
2.939260 seconds (44.35 k allocations: 2.366 MiB)
3.141632964
julia> @time pi_global(10^9)
2.891349 seconds (6 allocations: 192 bytes)
3.14153098
julia> using Random
julia> function pi_local(n::Int, rng::MersenneTwister)
s = 0
for _ in 1:n
s += rand(rng)^2 + rand(rng)^2 < 1
end
return 4 * s / n
end
pi_local (generic function with 1 method)
julia> mt = MersenneTwister();
julia> @time pi_local(10^9, mt)
3.129134 seconds (30.73 k allocations: 1.574 MiB)
3.141618824
julia> @time pi_local(10^9, mt)
3.115317 seconds (6 allocations: 192 bytes)
3.141620408
We see that there is a huge regression in the performance of rand()
between
versions of Julia. Let us understand what is the reason for this.
Digging down the rand()
implementation
We switch back to Julia 1.5.3 and will stick to it till the end of this post.
First we do a quick benchmark (I am using the same Julia 1.5.3. session as above):
julia> using BenchmarkTools
julia> @btime rand()
4.784 ns (0 allocations: 0 bytes)
0.40836802665975824
julia> @btime rand($mt)
2.890 ns (0 allocations: 0 bytes)
0.23541608567839556
There is a significant difference in performance indeed. So what does rand()
do that costs so much? Let us see the definition of relevant method for rand
(it is easy to get it by writing @edit rand()
):
rand(rng::AbstractRNG=default_rng(), ::Type{X}=Float64) where {X} =
rand(rng, Sampler(rng, X, Val(1)))
We can see that the only difference between rand()
and rand(mt)
is that the
former calls default_rng()
function (it is from the Random
module).
In a similar way as above we dig down to the relevant definition:
const THREAD_RNGs = MersenneTwister[]
@inline default_rng() = default_rng(Threads.threadid())
@noinline function default_rng(tid::Int)
0 < tid <= length(THREAD_RNGs) || _rng_length_assert()
if @inbounds isassigned(THREAD_RNGs, tid)
@inbounds MT = THREAD_RNGs[tid]
else
MT = MersenneTwister()
@inbounds THREAD_RNGs[tid] = MT
end
return MT
end
@noinline _rng_length_assert() = @assert false "0 < tid <= length(THREAD_RNGs)"
function __init__()
resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object
end
And now we see the reason. In Julia 1.5.3 rand()
is thread safe (it was not in
Julia 1.0, and that is the reason of the difference in performance between
versions). Ensuring thread safety must cost something. In this case even
although the code that extracts out the appropriate MersenneTwister
instance
from the THREAD_RNGs
vector is simple it has a noticeable cost (the reason is
that random number generation itself is extremely well optimized and fast).
Conclusion
Going back to the beginning of this post: remember not to use rand()
in your
performance critical code.
Also I think this example shows very nicely how huge benefits we have from the fact that Julia is mostly written in Julia — it is only a few keystrokes and we could identify the root cause of the performance puzzle.