Introduction

This week chapter 4 of my upcoming Julia for Data Analysis book has been released by Manning in MEAP. Therefore, following the plan I have announced in this post, I will discuss a topic related to the material I cover in chapter 4, but that was not included in the book.

In chapter 4 I give an introduction to working with collections in Julia. A fundamental topic that is related with this subject is understanding that values in Julia can be either mutable (like Vector or Dict) or immutable (like Tuple or NamedTuple).

In this post I want to present an example showing the relevance of this distinction.

The codes I use were tested under Julia 1.7.2 and BenchmarkTools.jl 1.3.1.

How to check if some value is mutable?

If you have some value you can check if it is mutable using the ismutable function. Let us check it on some example:

julia> x = big(1)
1

julia> typeof(x)
BigInt

julia> ismutable(x)
true

As you can see the value having BigInt type is mutable. This means that it can be changed in-place. Therefore, you now know that you must be careful when passing BigInt values to functions as you cannot assume that such functions will not change the passed value.

Let us check that this is indeed the case:

julia> x
1

julia> Base.GMP.flipsign!(x, -1)
-1

julia> x
-1

In this case the Base.GMP.flipsign! function mutated the value bound to the x variable name.

Is mutability of BigInt useful?

You might ask why BigInt values are made mutable. This is a bit surprising as other standard numeric types like Int64, Bool, or Float64 are immutable. The reason is that in certain cases it allows to perform operations on BigInt values in a faster way. Here is an example of computing a sum of elements of an array storing BigInt values:

julia> using BenchmarkTools

julia> v = collect(big(1):big(1_000_000));

julia> function sum1(v::AbstractArray{BigInt})
           s = big(0)
           for x in v
               s += x
           end
           return s
       end
sum1 (generic function with 1 method)

julia> function sum2(v::AbstractArray{BigInt})
           s = big(0)
           for x in v
               Base.GMP.MPZ.add!(s, x)
           end
           return s
       end
sum2 (generic function with 1 method)

julia> @btime sum1($v)
  97.111 ms (2000002 allocations: 45.78 MiB)
500000500000

julia> @btime sum2($v)
  27.560 ms (3 allocations: 48 bytes)
500000500000

julia> @btime sum($v)
  27.557 ms (3 allocations: 48 bytes)
500000500000

The difference between sum1 and sum2 is that the former uses + to make addition and the later uses Base.GMP.MPZ.add!, which updates its first argument in-place. As you can see sum2 allocates much less memory and is faster. Additionally I have shown you that sum has essentially the same performance. This suggests that sum has a special method for performing summation of BigInt values. Indeed this is the case, the implementation of this method can be found in base/gmp.jl file and is as follows:

sum(arr::Union{AbstractArray{BigInt}, Tuple{BigInt, Vararg{BigInt}}}) =
    foldl(MPZ.add!, arr; init=BigInt(0))

We can see that it uses an in-place add! function.

Note that both in sum2 and sum functions it was crucial to use a new BigInt value as an accumulator. The point is that since add! mutates it in-place we must not use any value stored in the source array for this purpose.

Conclusions

I hope that you will find the presented example useful. As a final comment let me highlight one convention.

In our codes the add! function has a ! suffix. This is a convention that signals that it might mutate its arguments. This is indeed what happens both in sum2 and sum functions to the accumulator value. However, sum2 and sum functions do not need a ! in their names as in their implementation the passed array is not mutated (only accumulator value that is created inside these functions is mutated).