Learning to create a vector in Julia
Introduction
Last week I have written a “Learning to zip stuff in Julia” post
that discussed the zip
function. To my surprise, even though the topic was basic,
it has received a lot of positive feedback. Therefore I thought of writing about
another entry-level problem.
Often, when working with arrays in Julia we want to flatten them to a vector. Today, I want to discuss two ways how you can do it and the differences between them.
The post was written under Julia 1.9.2.
The problem
Assume you have the following three dimensional array:
julia> a3d = [(i, j, k) for i in 1:2, j in 1:3, k in 1:4]
2×3×4 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
(1, 1, 1) (1, 2, 1) (1, 3, 1)
(2, 1, 1) (2, 2, 1) (2, 3, 1)
[:, :, 2] =
(1, 1, 2) (1, 2, 2) (1, 3, 2)
(2, 1, 2) (2, 2, 2) (2, 3, 2)
[:, :, 3] =
(1, 1, 3) (1, 2, 3) (1, 3, 3)
(2, 1, 3) (2, 2, 3) (2, 3, 3)
[:, :, 4] =
(1, 1, 4) (1, 2, 4) (1, 3, 4)
(2, 1, 4) (2, 2, 4) (2, 3, 4)
In many situations you might want to transform it into a one dimensional vector.
For example many functions explicitly require AbstractVector
as their input.
The question is how can you do it.
There are two fundamental ways to perform this operation. The first one creates a new independent vector from the source data, and the second one reuses the memory of the source data. Let me discuss them in more detail.
Copied vector
If you want to copy the data in a3d
into a new vector you can write:
julia> a3d[:]
24-element Vector{Tuple{Int64, Int64, Int64}}:
(1, 1, 1)
(2, 1, 1)
(1, 2, 1)
(2, 2, 1)
(1, 3, 1)
(2, 3, 1)
(1, 1, 2)
(2, 1, 2)
(1, 2, 2)
(2, 2, 2)
(1, 3, 2)
(2, 3, 2)
(1, 1, 3)
(2, 1, 3)
(1, 2, 3)
(2, 2, 3)
(1, 3, 3)
(2, 3, 3)
(1, 1, 4)
(2, 1, 4)
(1, 2, 4)
(2, 2, 4)
(1, 3, 4)
(2, 3, 4)
The syntax is short and easy to read. It takes advantage of the fact that every Julia array should support linear indexing, as is explained in the Julia Manual section on Linear Indexing.
The benefit of this approach is that the new object is freshly allocated, so modifying it will not modify the source. The downside is that it requires memory allocation.
Aliased vector
If we want to avoid excessive memory allocation (which might be relevant for large objects)
we can create a vector from an array without copying the data. This can be achieved using
the vec
function:
julia> v = vec(a3d)
24-element Vector{Tuple{Int64, Int64, Int64}}:
(1, 1, 1)
(2, 1, 1)
(1, 2, 1)
(2, 2, 1)
(1, 3, 1)
(2, 3, 1)
(1, 1, 2)
(2, 1, 2)
(1, 2, 2)
(2, 2, 2)
(1, 3, 2)
(2, 3, 2)
(1, 1, 3)
(2, 1, 3)
(1, 2, 3)
(2, 2, 3)
(1, 3, 3)
(2, 3, 3)
(1, 1, 4)
(2, 1, 4)
(1, 2, 4)
(2, 2, 4)
(1, 3, 4)
(2, 3, 4)
Now v
is a vector that shares memory with a3d
. We can check it using the pointer
function:
julia> pointer(v)
Ptr{Tuple{Int64, Int64, Int64}} @0x000002606d121540
julia> pointer(a3d)
Ptr{Tuple{Int64, Int64, Int64}} @0x000002606d121540
or the Base.mightalias
function:
julia> Base.mightalias(a3d, v)
true
The benefit is that vec(a3d)
is in general much faster than a3d[:]
since it makes less work.
The downside is that you need to remember that mutating one of the objects will change the other.
For example:
julia> a3d[1, 1, 1] = (100, 100, 100);
julia> a3d
2×3×4 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
(100, 100, 100) (1, 2, 1) (1, 3, 1)
(2, 1, 1) (2, 2, 1) (2, 3, 1)
[:, :, 2] =
(1, 1, 2) (1, 2, 2) (1, 3, 2)
(2, 1, 2) (2, 2, 2) (2, 3, 2)
[:, :, 3] =
(1, 1, 3) (1, 2, 3) (1, 3, 3)
(2, 1, 3) (2, 2, 3) (2, 3, 3)
[:, :, 4] =
(1, 1, 4) (1, 2, 4) (1, 3, 4)
(2, 1, 4) (2, 2, 4) (2, 3, 4)
julia> v
24-element Vector{Tuple{Int64, Int64, Int64}}:
(100, 100, 100)
(2, 1, 1)
(1, 2, 1)
(2, 2, 1)
(1, 3, 1)
(2, 3, 1)
(1, 1, 2)
(2, 1, 2)
(1, 2, 2)
(2, 2, 2)
(1, 3, 2)
(2, 3, 2)
(1, 1, 3)
(2, 1, 3)
(1, 2, 3)
(2, 2, 3)
(1, 3, 3)
(2, 3, 3)
(1, 1, 4)
(2, 1, 4)
(1, 2, 4)
(2, 2, 4)
(1, 3, 4)
(2, 3, 4)
Conclusions
In summary if in your analytical workflow you need a vector,
but you have a general array a
as an input you have two options:
- write
a[:]
, which creates a copy (safer, but slower and uses more memory); - write
vec(a)
, which creates an alias (unsafe, but faster and uses less memory).
In practice I find both options useful depending on the circumstances, so I think it is worth to be aware of them.