Several years ago I switched to Julia because of its speed. However, every day I am positively surprised how fast it is.
Today I was writing a simulation where I needed to check if some value was
contained in some collection. Traditionally programmers are recommended to use
Set structure to perform such operation. However, since I needed to
squeeze out every last bit of performance in my code I investigated all
available options. Julia developers positively surprised me as it seems that
they managed to implement an extremely fast lookup in a standard
Therefore in my post today I decided to share some benchmark results showing
how fast lookup in
Vector container is against
The post was written under Julia 1.7.0, DataFrames.jl 1.3.2, BenchmarkTools.jl 1.3.1, and Plots.jl 1.27.1.
In order to test scalability of lookup I decided to test the performance for collections from size 10 to 40 to see if some trend can be observed. I wanted to store in them random strings and then perform a check if every value stored in the collection is indeed present in it.
To ensure fair evaluation of the results I used the
from the BenchmarkTools.jl package. Here is the test code:
using DataFrames using BenchmarkTools using Random using Plots f(x, r) = all(in(x), r) # function testing lookup speed df = DataFrame() Random.seed!(1234) for i in 10:40 v = [randstring(rand(1:1000)) for _ in 1:i] # randomly generate a Vector s = Set(v) # turn it to Set for comparison @assert length(s) == i # make sure we had no duplicates t1 = @belapsed f($s, $v) t2 = @belapsed f($v, $v) # display the intermetiate results so that I can monitor the progress @show (i, t1, t2) push!(df, (;i, t1, t2)) end plot(df.i, [df.t1 df.t2]; labels=["Set" "Vector"], legend=:topleft, xlabel="collection size", ylabel="time in seconds")
The code produces the following plot:
In the plot it seems that performance of function
f doing lookup for both
collections scales approximately linearly in the tested range. The lookup in
Set is slower than the lookup in
Vector. Additionally, which is positive,
lookup time in
Vector has a more stable timing (
Set timings are strangely
jagged). Finally, it seems that the difference in the timing increases with
collection size. Let us visualize this:
plot(df.i, df.t1-df.t2; labels="Set-Vector", legend=:topleft, xlabel="collection size", ylabel="time difference in seconds")
Indeed it seems that the timing difference increased in the tested range of collection sizes.
It is getting late today, and I do not want to delay my weekly posts. Therefore, I will do some more testing in the coming week and report you the results in my next post.
From today’s experiments we can see that in the test setting I used
Vector lookup was significantly faster than
As a side note, it was super easy to implement these benchmarks in Julia.