Introduction

I have just hit 1000 answers for the [julia] tag on Stack Overflow so I felt like writing about it. In order to have a complete one thousand and one stories collection today I thought of writing about a feature of Julia that will show you how challenging the design decisions that have to be made when designing functions are. We will work with one of the most fundamental functions, the sum.

Before I go to the technical details let me go back to my recent post that I wrote about a model that Alan Edelman prepared for one of classes during his studies. Recently I had an opportunity to discuss with him about the exact context of creation of the model. Here you can read the summary which I was lacking when I was writing the original post:

In 1980, Alan Edelman was a 17 year old freshman at Yale University where he met his social science distribution requirement by taking Psychology 101 with Dr. Kenji Hakuta. There he learned about the famous Kitty Genovese murder and the concept of diffusion of responsibility. Having to write a paper for this freshman class, and being a “math person” he figured why not take the idea of diffusion literally and write a paper about that. He does not recall if he still has a copy of that paper, but it may be in a box in the attic. He thinks he got an A on that paper.

The post was written under Julia 1.6.3.

Are you sure you understand how the sum function works?

I will test your knowledge by example. The first task is to perform row-wise summation of the following matrix:

julia> mat = fill(Int8(100), 5, 10)
5×10 Matrix{Int8}:
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100
 100  100  100  100  100  100  100  100  100  100

julia> sum(eachcol(mat))
5-element Vector{Int8}:
 -24
 -24
 -24
 -24
 -24

julia> sum.(eachrow(mat))
5-element Vector{Int64}:
 1000
 1000
 1000
 1000
 1000

If you are surprised by the result let me explain the situation. In the first case sum operates on whole vectors. In the second case sum operates on scalars. Why would this make the difference? The reason is that sum does not use + for aggregation, but it employs the Base.add_sum as the reduction operator which is defined as follows:

add_sum(x, y) = x + y
add_sum(x::SmallSigned, y::SmallSigned) = Int(x) + Int(y)
add_sum(x::SmallUnsigned, y::SmallUnsigned) = UInt(x) + UInt(y)
add_sum(x::Real, y::Real)::Real = x + y

As you can see the add_sum will promote the result to Int or UInt only if it is passed scalar integers. Therefore if we pass it vectors of integers no promotion happens.

Now for sure you know what will be the sum of the following vector:

julia> v = Integer[0x64; fill(Int8(100), 9)]
10-element Vector{Integer}:
 0x64
  100
  100
  100
  100
  100
  100
  100
  100
  100

Let us check:

julia> sum(v)
0xe8

Could you have guessed it? If yes, you probably assume that sum is using foldl and you have noticed that Base.add_sum does not perform promotion to Int or UInt when you mix signed and unsigned integers.

Unfortunately, if this was your guess, you are wrong in general. Consider this scenario:

julia> sum(Integer[0x64; fill(Int8(100), 9999)])
937536

The result we get is quite surprising. We could have expected:

julia> foldl(+, Integer[0x64; fill(Int8(100), 9999)])
0x40

as we know that Base.add_sum will always fall back to + in this case. This would be consistent with the previous result.

What is the reason of the difference? Actually sum does not use foldl but reduce, and reduce does not have a guaranteed order of summation. This means that in the latter case we must have made some summation of two Int8 (100) values using Base.add_sum which promoted the result to Int.

Maybe above you thought of calling foldl with Base.add_sum like ths?:

julia> foldl(Base.add_sum, Integer[0x64; fill(Int8(100), 9999)])
0x00000000000f4240

0x00000000000f4240 is just 1000000 (which is a correct result if we were widening types always when doing the summation), but why do we get such a weird value? The reason is that foldl differs from sum in the way it initializes the summation. It uses the first element of the collection(0x64 in our case) and promotes it to the type that woud be produced if this element were added using Base.add_sum to itself and we know UInt8 to UInt8 invokes promotion to UInt.

What else could go wrong? Try this:

julia> using Random

julia> Random.seed!(1234)
MersenneTwister(1234)

julia> x = rand(1000)
1000-element Vector{Float64}:
 0.5908446386657102
 0.7667970365022592
 0.5662374165061859
 0.4600853424625171
 0.7940257103317943
 0.8541465903790502
 0.20058603493384108
 0.2986142783434118
 ⋮
 0.5762976355934157
 0.08831200391130656
 0.8994769043886504
 0.8232831225471882
 0.37869007913520947
 0.7812366659068535
 0.4651012221417914

julia> sum(x)
496.84883432553806

julia> sum(x, init=0.0)
496.84883432553846

As you can see the results are not identical (they differ at the second least significant digit). What have messed up this time? By specifying init keyword argument, although 0.0 is a neutral in summation, we forced sum to use a different summation order again and for floats order of summation is known to affect the result.

Wait - have I said that 0.0 is a neutral in summation? I have lied. See:

julia> isequal(sum([-0.0, -0.0]), sum([-0.0, -0.0], init=0.0))
false

because:

julia> sum([-0.0, -0.0])
-0.0

julia> sum([-0.0, -0.0], init=0.0)
0.0

At this point I start asking myself why in my math classes I was taught to use real numbers and not IEEE 754 standard which seems to be at play much more often in practice (at least if you are using computers). I will have to pose this question Alan Edelman who is a professor of both mathematics and computer science the next time I have the privilege of talking with him.

Conclusions

In this blog and on Stack Overflow I try to show users that Julia is a nice language to work with (of which I am really convinced).

However, having written these 1000 stories that end good one wants to show at least one story when the dark side shows up (of course the problems I have discussed are not Julia specific, but their particular manifestation is a consequence of design decisions that Julia developers made).

Can we do anything about the problems I have shown? There is one remedy and one warning to keep in mind.

The remedy is the following: when working with collections in Julia always take care to make sure they have homogeneous type of elements and this type is chosen appropriately to the operation you want to perform. Except for very rare situations there is little sense in mixing numbers of different types in one collection so just do not do it.

The warning is the IEEE floating point arithmetic standard consequence: if you work with floats better treat the result of operations on them as only approximate.