# The need for rand speed

# Introduction

Very often when I answer questions on Stack Overflow I learn something new. Recently when discussing random number generation in this post I have made an answer using a practice I knew worked from my experience, but it turned out that I did not really understand why (and thanks to rafak for a great comment).

Let us start with the conclusion from the discussion and then I will expand on it:

Always explicitly pass random number generator to the

`rand`

function in performance-critical code.

Let us first see a simple example of this rule at work and next try to understand the reason for this recommendation.

# Estimating \(\pi\) using Monte Carlo simulation

Let us write a simple function that approximates \(\pi\) using Monte Carlo simulation and uses the default global pseudo-random number generator.

```
function pi_global(n::Int)
s = 0
for _ in 1:n
s += rand()^2 + rand()^2 < 1
end
return 4 * s / n
end
```

The code takes advantage from a well known fact that if we sample a point \((x,y)\) uniformly from \([0,1,]^2\) square the probability that \(x^2+y^2\) is less than \(1\) is equal to \(\pi/4\).

We check the runtime of this code:

```
julia> @time pi_global(10^9)
6.998930 seconds (19 allocations: 20.188 KiB)
3.141615124
julia> @time pi_global(10^9)
7.002321 seconds
3.141527116
```

as you can see on my laptop it is around 7 seconds.

Now let us write a function that takes a `MersenneTwister`

generator (this is the default pseudo-random number generator in Julia).

```
using Random
function pi_local(n::Int, rng::MersenneTwister)
s = 0
for _ in 1:n
s += rand(rng)^2 + rand(rng)^2 < 1
end
return 4 * s / n
end
```

Here is its timing:

```
julia> mt = MersenneTwister();
julia> @time pi_local(10^9, mt)
2.723634 seconds
3.141526412
julia> @time pi_local(10^9, mt)
2.734530 seconds
3.141671232
```

Wow! I would not have expected this.

Now let me reveal that I am on Julia 1.5.3. Interestingly, when I built my
habits of working with `rand`

it was Julia 1.0 time. Let us check these codes on
Julia 1.0.5 (that soon will stop being supported). Here are the results:

```
julia> function pi_global(n::Int)
s = 0
for _ in 1:n
s += rand()^2 + rand()^2 < 1
end
return 4 * s / n
end
pi_global (generic function with 1 method)
julia> @time pi_global(10^9)
2.939260 seconds (44.35 k allocations: 2.366 MiB)
3.141632964
julia> @time pi_global(10^9)
2.891349 seconds (6 allocations: 192 bytes)
3.14153098
julia> using Random
julia> function pi_local(n::Int, rng::MersenneTwister)
s = 0
for _ in 1:n
s += rand(rng)^2 + rand(rng)^2 < 1
end
return 4 * s / n
end
pi_local (generic function with 1 method)
julia> mt = MersenneTwister();
julia> @time pi_local(10^9, mt)
3.129134 seconds (30.73 k allocations: 1.574 MiB)
3.141618824
julia> @time pi_local(10^9, mt)
3.115317 seconds (6 allocations: 192 bytes)
3.141620408
```

We see that there is a huge regression in the performance of `rand()`

between
versions of Julia. Let us understand what is the reason for this.

# Digging down the `rand()`

implementation

We switch back to Julia 1.5.3 and will stick to it till the end of this post.

First we do a quick benchmark (I am using the same Julia 1.5.3. session as above):

```
julia> using BenchmarkTools
julia> @btime rand()
4.784 ns (0 allocations: 0 bytes)
0.40836802665975824
julia> @btime rand($mt)
2.890 ns (0 allocations: 0 bytes)
0.23541608567839556
```

There is a significant difference in performance indeed. So what does `rand()`

do that costs so much? Let us see the definition of relevant method for `rand`

(it is easy to get it by writing `@edit rand()`

):

```
rand(rng::AbstractRNG=default_rng(), ::Type{X}=Float64) where {X} =
rand(rng, Sampler(rng, X, Val(1)))
```

We can see that the only difference between `rand()`

and `rand(mt)`

is that the
former calls `default_rng()`

function (it is from the `Random`

module).

In a similar way as above we dig down to the relevant definition:

```
const THREAD_RNGs = MersenneTwister[]
@inline default_rng() = default_rng(Threads.threadid())
@noinline function default_rng(tid::Int)
0 < tid <= length(THREAD_RNGs) || _rng_length_assert()
if @inbounds isassigned(THREAD_RNGs, tid)
@inbounds MT = THREAD_RNGs[tid]
else
MT = MersenneTwister()
@inbounds THREAD_RNGs[tid] = MT
end
return MT
end
@noinline _rng_length_assert() = @assert false "0 < tid <= length(THREAD_RNGs)"
function __init__()
resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object
end
```

And now we see the reason. In Julia 1.5.3 `rand()`

is thread safe (it was not in
Julia 1.0, and that is the reason of the difference in performance between
versions). Ensuring thread safety must cost something. In this case even
although the code that extracts out the appropriate `MersenneTwister`

instance
from the `THREAD_RNGs`

vector is simple it has a noticeable cost (the reason is
that random number generation itself is extremely well optimized and fast).

# Conclusion

Going back to the beginning of this post: remember not to use `rand()`

in your
performance critical code.

Also I think this example shows very nicely how huge benefits we have from the fact that Julia is mostly written in Julia — it is only a few keystrokes and we could identify the root cause of the performance puzzle.