Introduction

Last week I have written a “Learning to zip stuff in Julia” post that discussed the zip function. To my surprise, even though the topic was basic, it has received a lot of positive feedback. Therefore I thought of writing about another entry-level problem.

Often, when working with arrays in Julia we want to flatten them to a vector. Today, I want to discuss two ways how you can do it and the differences between them.

The post was written under Julia 1.9.2.

The problem

Assume you have the following three dimensional array:

julia> a3d = [(i, j, k) for i in 1:2, j in 1:3, k in 1:4]
2×3×4 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
 (1, 1, 1)  (1, 2, 1)  (1, 3, 1)
 (2, 1, 1)  (2, 2, 1)  (2, 3, 1)

[:, :, 2] =
 (1, 1, 2)  (1, 2, 2)  (1, 3, 2)
 (2, 1, 2)  (2, 2, 2)  (2, 3, 2)

[:, :, 3] =
 (1, 1, 3)  (1, 2, 3)  (1, 3, 3)
 (2, 1, 3)  (2, 2, 3)  (2, 3, 3)

[:, :, 4] =
 (1, 1, 4)  (1, 2, 4)  (1, 3, 4)
 (2, 1, 4)  (2, 2, 4)  (2, 3, 4)

In many situations you might want to transform it into a one dimensional vector. For example many functions explicitly require AbstractVector as their input. The question is how can you do it.

There are two fundamental ways to perform this operation. The first one creates a new independent vector from the source data, and the second one reuses the memory of the source data. Let me discuss them in more detail.

Copied vector

If you want to copy the data in a3d into a new vector you can write:

julia> a3d[:]
24-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 1, 1)
 (2, 1, 1)
 (1, 2, 1)
 (2, 2, 1)
 (1, 3, 1)
 (2, 3, 1)
 (1, 1, 2)
 (2, 1, 2)
 (1, 2, 2)
 (2, 2, 2)
 (1, 3, 2)
 (2, 3, 2)
 (1, 1, 3)
 (2, 1, 3)
 (1, 2, 3)
 (2, 2, 3)
 (1, 3, 3)
 (2, 3, 3)
 (1, 1, 4)
 (2, 1, 4)
 (1, 2, 4)
 (2, 2, 4)
 (1, 3, 4)
 (2, 3, 4)

The syntax is short and easy to read. It takes advantage of the fact that every Julia array should support linear indexing, as is explained in the Julia Manual section on Linear Indexing.

The benefit of this approach is that the new object is freshly allocated, so modifying it will not modify the source. The downside is that it requires memory allocation.

Aliased vector

If we want to avoid excessive memory allocation (which might be relevant for large objects) we can create a vector from an array without copying the data. This can be achieved using the vec function:

julia> v = vec(a3d)
24-element Vector{Tuple{Int64, Int64, Int64}}:
 (1, 1, 1)
 (2, 1, 1)
 (1, 2, 1)
 (2, 2, 1)
 (1, 3, 1)
 (2, 3, 1)
 (1, 1, 2)
 (2, 1, 2)
 (1, 2, 2)
 (2, 2, 2)
 (1, 3, 2)
 (2, 3, 2)
 (1, 1, 3)
 (2, 1, 3)
 (1, 2, 3)
 (2, 2, 3)
 (1, 3, 3)
 (2, 3, 3)
 (1, 1, 4)
 (2, 1, 4)
 (1, 2, 4)
 (2, 2, 4)
 (1, 3, 4)
 (2, 3, 4)

Now v is a vector that shares memory with a3d. We can check it using the pointer function:

julia> pointer(v)
Ptr{Tuple{Int64, Int64, Int64}} @0x000002606d121540

julia> pointer(a3d)
Ptr{Tuple{Int64, Int64, Int64}} @0x000002606d121540

or the Base.mightalias function:

julia> Base.mightalias(a3d, v)
true

The benefit is that vec(a3d) is in general much faster than a3d[:] since it makes less work. The downside is that you need to remember that mutating one of the objects will change the other. For example:

julia> a3d[1, 1, 1] = (100, 100, 100);

julia> a3d
2×3×4 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
 (100, 100, 100)  (1, 2, 1)  (1, 3, 1)
 (2, 1, 1)        (2, 2, 1)  (2, 3, 1)

[:, :, 2] =
 (1, 1, 2)  (1, 2, 2)  (1, 3, 2)
 (2, 1, 2)  (2, 2, 2)  (2, 3, 2)

[:, :, 3] =
 (1, 1, 3)  (1, 2, 3)  (1, 3, 3)
 (2, 1, 3)  (2, 2, 3)  (2, 3, 3)

[:, :, 4] =
 (1, 1, 4)  (1, 2, 4)  (1, 3, 4)
 (2, 1, 4)  (2, 2, 4)  (2, 3, 4)

julia> v
24-element Vector{Tuple{Int64, Int64, Int64}}:
 (100, 100, 100)
 (2, 1, 1)
 (1, 2, 1)
 (2, 2, 1)
 (1, 3, 1)
 (2, 3, 1)
 (1, 1, 2)
 (2, 1, 2)
 (1, 2, 2)
 (2, 2, 2)
 (1, 3, 2)
 (2, 3, 2)
 (1, 1, 3)
 (2, 1, 3)
 (1, 2, 3)
 (2, 2, 3)
 (1, 3, 3)
 (2, 3, 3)
 (1, 1, 4)
 (2, 1, 4)
 (1, 2, 4)
 (2, 2, 4)
 (1, 3, 4)
 (2, 3, 4)

Conclusions

In summary if in your analytical workflow you need a vector, but you have a general array a as an input you have two options:

  • write a[:], which creates a copy (safer, but slower and uses more memory);
  • write vec(a), which creates an alias (unsafe, but faster and uses less memory).

In practice I find both options useful depending on the circumstances, so I think it is worth to be aware of them.