Introduction

Very often when I answer questions on Stack Overflow I learn something new. Recently when discussing random number generation in this post I have made an answer using a practice I knew worked from my experience, but it turned out that I did not really understand why (and thanks to rafak for a great comment).

Let us start with the conclusion from the discussion and then I will expand on it:

Always explicitly pass random number generator to the rand function in performance-critical code.

Let us first see a simple example of this rule at work and next try to understand the reason for this recommendation.

Estimating \(\pi\) using Monte Carlo simulation

Let us write a simple function that approximates \(\pi\) using Monte Carlo simulation and uses the default global pseudo-random number generator.

function pi_global(n::Int)
    s = 0
    for _ in 1:n
        s += rand()^2 + rand()^2 < 1
    end
    return 4 * s / n
end

The code takes advantage from a well known fact that if we sample a point \((x,y)\) uniformly from \([0,1,]^2\) square the probability that \(x^2+y^2\) is less than \(1\) is equal to \(\pi/4\).

We check the runtime of this code:

julia> @time pi_global(10^9)
  6.998930 seconds (19 allocations: 20.188 KiB)
3.141615124

julia> @time pi_global(10^9)
  7.002321 seconds
3.141527116

as you can see on my laptop it is around 7 seconds.

Now let us write a function that takes a MersenneTwister generator (this is the default pseudo-random number generator in Julia).

using Random

function pi_local(n::Int, rng::MersenneTwister)
    s = 0
    for _ in 1:n
        s += rand(rng)^2 + rand(rng)^2 < 1
    end
    return 4 * s / n
end

Here is its timing:

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  2.723634 seconds
3.141526412

julia> @time pi_local(10^9, mt)
  2.734530 seconds
3.141671232

Wow! I would not have expected this.

Now let me reveal that I am on Julia 1.5.3. Interestingly, when I built my habits of working with rand it was Julia 1.0 time. Let us check these codes on Julia 1.0.5 (that soon will stop being supported). Here are the results:

julia> function pi_global(n::Int)
           s = 0
           for _ in 1:n
               s += rand()^2 + rand()^2 < 1
           end
           return 4 * s / n
       end
pi_global (generic function with 1 method)

julia> @time pi_global(10^9)
  2.939260 seconds (44.35 k allocations: 2.366 MiB)
3.141632964

julia> @time pi_global(10^9)
  2.891349 seconds (6 allocations: 192 bytes)
3.14153098

julia> using Random

julia> function pi_local(n::Int, rng::MersenneTwister)
           s = 0
           for _ in 1:n
               s += rand(rng)^2 + rand(rng)^2 < 1
           end
           return 4 * s / n
       end
pi_local (generic function with 1 method)

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  3.129134 seconds (30.73 k allocations: 1.574 MiB)
3.141618824

julia> @time pi_local(10^9, mt)
  3.115317 seconds (6 allocations: 192 bytes)
3.141620408

We see that there is a huge regression in the performance of rand() between versions of Julia. Let us understand what is the reason for this.

Digging down the `rand()` implementation

We switch back to Julia 1.5.3 and will stick to it till the end of this post.

First we do a quick benchmark (I am using the same Julia 1.5.3. session as above):

julia> using BenchmarkTools

julia> @btime rand()
  4.784 ns (0 allocations: 0 bytes)
0.40836802665975824

julia> @btime rand($mt)
  2.890 ns (0 allocations: 0 bytes)
0.23541608567839556

There is a significant difference in performance indeed. So what does rand() do that costs so much? Let us see the definition of relevant method for rand (it is easy to get it by writing @edit rand()):

rand(rng::AbstractRNG=default_rng(), ::Type{X}=Float64) where {X} =
    rand(rng, Sampler(rng, X, Val(1)))

We can see that the only difference between rand() and rand(mt) is that the former calls default_rng() function (it is from the Random module).

In a similar way as above we dig down to the relevant definition:

const THREAD_RNGs = MersenneTwister[]
@inline default_rng() = default_rng(Threads.threadid())
@noinline function default_rng(tid::Int)
    0 < tid <= length(THREAD_RNGs) || _rng_length_assert()
    if @inbounds isassigned(THREAD_RNGs, tid)
        @inbounds MT = THREAD_RNGs[tid]
    else
        MT = MersenneTwister()
        @inbounds THREAD_RNGs[tid] = MT
    end
    return MT
end
@noinline _rng_length_assert() =  @assert false "0 < tid <= length(THREAD_RNGs)"

function __init__()
    resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object
end

And now we see the reason. In Julia 1.5.3 rand() is thread safe (it was not in Julia 1.0, and that is the reason of the difference in performance between versions). Ensuring thread safety must cost something. In this case even although the code that extracts out the appropriate MersenneTwister instance from the THREAD_RNGs vector is simple it has a noticeable cost (the reason is that random number generation itself is extremely well optimized and fast).

Conclusion

Going back to the beginning of this post: remember not to use rand() in your performance critical code.

Also I think this example shows very nicely how huge benefits we have from the fact that Julia is mostly written in Julia — it is only a few keystrokes and we could identify the root cause of the performance puzzle.

Introduction

Estimating \(\pi\) using Monte Carlo simulation

Digging down the rand() implementation

Conclusion

Digging down the `rand()` implementation