Introduction

It is a well known performance recommendation in the Julia language that avoiding allocations matters and using immutable objects is a good practice. In this post, in the context of creation of agent based models, I want to show two examples of a toy codes that highlight the key aspects of this issue.

All the examples are tested under Julia 1.5.2. The Setfield.jl package version used is 0.7.

Avoiding allocations

Consider the following code:

using Statistics

struct AgentI
    loc::Int
end

mutable struct AgentM
    loc::Int
end

fI(n) = mean(x -> x.loc, [AgentI(i) for i in 1:n])
fM(n) = mean(x -> x.loc, [AgentM(i) for i in 1:n])

The only difference between functions fI and fM is that they respectively work on immutable and mutable objects. I use mean for aggeration to make sure that the compiler does not optimize out too much.

Let us benchmark these codes:

julia> fI(1); fM(1); GC.gc(); # force compilation and collect garbage

julia> @time fI(10^8)
  0.385592 seconds (2 allocations: 762.940 MiB, 0.73% gc time)
5.00000005e7

julia> @time GC.gc()
  0.091111 seconds, 100.00% gc time

julia> @time fM(10^8)
  3.498990 seconds (100.00 M allocations: 2.235 GiB, 60.79% gc time)
5.00000005e7

julia> @time GC.gc()
  0.295949 seconds, 100.00% gc time

We see that working with mutable objects was ten times slower and also it lead to a higher garbage collection cost after fM terminated. In particular note that over 60% of run time of fM was spent on garbage collection.

The reason is the following. Allocating objects has three costs:

  • cost of actual allocation
  • cost of having to store and use object references instead of objects directly in the container
  • cost of cleaning-up (i.e. running garbage collector that frees unused memory)

The cost of data movement

A huge advantage of mutable objects is, well, that they are mutable. This makes it simple to update only their selected fields. One might wonder if the cost of having to create the immutable object anew each time in such cases is important. Let us investigate. First we set up our experiment:

using Statistics
using Setfield

struct Agent2I
    loc::Int
    junk::NTuple{100, Int}
end

mutable struct Agent2M
    loc::Int
    junk::NTuple{100, Int}
end

const REF_TUP = ntuple(i -> 0, 100)

function gI(n)
    x = [Agent2I(i, REF_TUP) for i in 1:n]
    for i in 1:n
        xi = x[i]
        x[i] = @set xi.loc += 1
    end
    return mean(x -> x.loc, x)
end

function gM(n)
    x = [Agent2M(i, REF_TUP) for i in 1:n]
    for i in 1:n
        x[i].loc += 1
    end
    return mean(x -> x.loc, x)
end

Note that we used REF_TUP to make the Agent2I object have a larger memory footprint. The size of agent state I used is usually more than enough in practice. As you can see in the code the Setfield.jl package makes it easy to update only selected field of an immutable object using the @set macro.

It is time to start benchmarking:

julia> gI(1); gM(1); GC.gc();

julia> @time gI(10^7)
  2.714677 seconds (2 allocations: 7.525 GiB, 0.10% gc time)
5.0000015e6

julia> @time GC.gc()
  0.277477 seconds, 100.00% gc time

julia> @time gM(10^7)
  7.287978 seconds (10.00 M allocations: 7.674 GiB, 59.53% gc time)
5.0000015e6

julia> @time GC.gc()
  0.965113 seconds, 100.00% gc time

So as you can see the time difference has narrowed down, but still the immutable code is faster.

Conclusions

In general working with mutable code is easier than working with immutable code. However, if you are working with performance critical code, avoiding using mutable objects is one of the first recommendations to consider when trying to optimize it. Also Setfield.jl (and upcoming Accessors.jl) make working with immutable objects relatively smooth.