# Grand Julia containers test. Can you get a perfect score?

# Introduction

Containers are objects grouping multiple values together. In Julia, examples of containers are vectors or dictionaries.

To ensure that it is easy to work with containers Julia introduces four interfaces:

- iteration;
- indexing;
- array;
- broadcasting.

They are described in the Interfaces section of the Julia Manual. However, this description is a bit technical and it mostly aimed at developers creating new container types. Therefore, I thought to write a post aimed to explain these interfaces by example from user’s perspective.

The material presented here is organized in two parts.

Part one is basic knowledge you should have. I tried to collect in this post most relevant information.

Part two is organized as a quiz to check your understanding of the discussed interfaces. There are ten questions in the quiz (plus one bonus question). Do you know an answer to all of them?

The post is tested under Julia 1.8.5 and DataFrames.jl 1.4.4.

# Container interfaces explained

**Iteration**

Iteration is the least demanding interface. It ensures that you can get elements from a container sequentially.

Most of containers are iterable, including: arrays, dictionaries, sets, I/O buffers, and strings.

If some container supports iteration then it can be used in `for`

loops,
comprehensions, and higher other functions relying only on iteration of values
like `foreach`

.

Here is a basic example of iterating a tuple:

```
julia> t = (1, 2, 3)
(1, 2, 3)
julia> for v in t
println(v)
end
1
2
3
julia> [v for v in t]
3-element Vector{Int64}:
1
2
3
```

An important feature of iteration interface is that it also works for
collections that get mutated when iterating them. A basic example is reading
data from `IOBuffer`

:

```
julia> buf = IOBuffer(join('a':'d', '\n'))
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1)
julia> foreach(println, eachline(buf))
a
b
c
d
julia> foreach(println, eachline(buf))
julia>
```

Note that by iterating `buf`

line by line we were mutating it. Therefore, the
second time we iterate it we do not get any output.

When you use iteration interface it is always essential to have a mindset that you are guaranteed to be able to read an element of a collection once.

**Indexing**

Indexing allows you do access elements from a container using their identifier typically called index. This, implicitly, means that you can read the same element multiple times without a problem (as opposed to iteration).

The key feature of this interface is that it supports a convenient `x[i]`

syntax
for getting elements from a container. Again, let us give a simple example:

```
julia> d = Dict(v => '0'+v for v in 1:4)
Dict{Int64, Char} with 4 entries:
4 => '4'
2 => '2'
3 => '3'
1 => '1'
julia> d[3]
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)
```

In practice in Julia this interface is implemented in several flavors. The most important of them are the following.

First, indexing can support only reading or reading and writing of data
to the container. For example entries of a `Vector`

can be mutated, but ranges
are read-only:

```
julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> v1[2] = 12
12
julia> v2 = 1:3
1:3
julia> v2[2] = 12
ERROR: CanonicalIndexError: setindex! not defined for UnitRange{Int64}
```

Lesson to remember: do not assume that indexable objects are always writeable.

The second flavor is that typically indexing interface assumes that it is
possible to order valid indices and define first and last index. This can be
conveniently exploited by using `begin`

and `end`

keywords when indexing:

```
julia> v1[begin]
1
julia> v1[end]
3
```

However, this support is not guaranteed by all objects that support indexing. A basic example is a dictionary:

```
julia> d[end]
ERROR: MethodError: no method matching lastindex(::Dict{Int64, Char})
```

So the second thing to remember is that you cannot assume that all indexable
objects will support passing `begin`

and `end`

keywords when indexing.

**Array**

Arrays move the indexing syntax further by allowing passing multiple values in
`x[i, j, ...]`

syntax and also introducing a notion of `CartesianIndex`

indexing. Typically types that support array interface opt-in to be a subtype of
`AbstractArray`

. An important feature of arrays is that they specify their
dimensions and using the `size`

function one can get them. Similarly using the
`axes`

functions one can get valid indices along the dimensions.

Here is an example:

```
julia> mat = [1 2 3; 4 5 6]
2×3 Matrix{Int64}:
1 2 3
4 5 6
julia> size(mat)
(2, 3)
julia> axes(mat)
(Base.OneTo(2), Base.OneTo(3))
julia> mat[1, 2]
2
julia> mat[CartesianIndex(1, 2)]
2
```

An important feature that is often needed is iteration over all elements of
an array. You can get an iterator of such indices using the `eachindex`

function.

It is important to remember that this function only guarantees to return a value that is a valid index to an array. The type of this index is determined based on an assessment of indexing efficiency. Here is an example:

```
julia> for idx in eachindex(mat)
print(idx, " ")
end
1 2 3 4 5 6
julia> for idx in eachindex(@view mat[1:2, 2:3])
print(idx, " ")
end
CartesianIndex(1, 1) CartesianIndex(2, 1) CartesianIndex(1, 2) CartesianIndex(2, 2)
```

One notable feature here is that although `mat`

is a matrix, and supports
passing indices for all dimensions when indexing, like e.g. `mat[1, 2]`

or
`mat[CartesianIndex(1, 2)]`

we see that `eachindex(mat)`

returns integer indices
(because they are faster). Such indices are called *linear indices* and are
supported by all arrays in Julia. Therefore you can write:

```
julia> mat[3]
2
```

This is an important feature to remember so let me stress it again: any array in Julia can be indexed using a single integer index, called linear index.

The second, somewhat surprising, feature to remember, is that you can append as
many trailing `1`

when indexing any array as you like, without affecting the
result (such extra `1`

s are ignored):

```
julia> mat[2, 2]
5
julia> mat[2, 2, 1, 1, 1]
5
```

Here you can find more explanations about this topic.

**Broadcasting**

Broadcasting is a convenient way of performing operations on arrays. In particular it allows for combining arrays of different sizes in a single operation. The rule is the following: when you operate on two arrays that do not have matching dimensions then expand dimensions of length 1 to the required length by repeating the values as needed. Here is a quick example:

```
julia> ["a" "b"] .^ [1, 2, 3]
3×2 Matrix{String}:
"a" "b"
"aa" "bb"
"aaa" "bbb"
```

We take a matrix `["a" "b"]`

having one row and two columns and a vector
having one column and three rows. As you can see single row of a matrix
gets repeated three times. Similarly single column of a vector gets repeated
two times. In this way the dimensions of both objects match and we can perform
the computation. Notably, our `[1, 2, 3]`

vector has only one dimension, so the
*column* dimension was added to it by broadcasting and set to `1`

(as you likely
remember when we discussed array indexing you were allowed to put trailing `1`

s
without affecting the result).

Broadcasting in Julia is performed by using a `.`

, you can read more about it
here. This is a very convenient syntax. For this reason people started
to use it in many contexts. Because of this popularity one issue immediately
surfaced that broadcasting is useful also in cases when they are not working
with arrays. Here is an example:

```
julia> ["a"] .^ [1, 2, 3]
3-element Vector{String}:
"a"
"aa"
"aaa"
```

While this works writing `["a"]`

is not very convenient. Therefore, broadcasting
was extended to allow for non-arrays to take part in operations. You can thus
write:

```
julia> "a" .^ [1, 2, 3]
3-element Vector{String}:
"a"
"aa"
"aaa"
```

Where `"a"`

*pretends* to be an array having length 1 in all dimensions that
are defined by the other container. This *pretending* is implemented in
cases where it is useful to think of a given value as as an array. A special,
and most common, case is when the value pretends to be a scalar. The easiest
way to think about a scalar is that it has 0 dimensions (so effectively in all
dimensions it can be treated as having length `1`

and be appropriately
expanded).

Sometimes you want a value to be treated as a scalar even if it is not a scalar.
In such case you can wrap it with `Ref`

. `Ref`

creates a `0`

-dimensional object
that behaves like a scalar. Here is an example:

```
julia> x = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> r = Ref(x)
Base.RefValue{Vector{Int64}}([1, 2, 3])
julia> size(r)
()
julia> r[]
3-element Vector{Int64}:
1
2
3
```

Note that we could extract the value storing in `r`

by indexing without passing
any index (because it is 0-dimensional). Now let us have a look when this is
useful:

```
julia> [1, 2, 3] .∈ [1, 3, 4]
3-element BitVector:
1
0
0
julia> [1, 2, 3] .∈ Ref([1, 3, 4])
3-element BitVector:
1
0
1
```

The first operation was not very useful. It was equivalent to writing
`[1 ∈ 1, 2 ∈ 3, 3 in ∈ 4]`

, probably not what we wanted. In the second operation
we protected the second vector with `Ref`

so that it was used as a whole
for every element of `[1, 2, 3]`

vector.

Using `Ref`

is also needed when some object does not have a default behavior
that it pretends to be an array implemented. A common example are standard
dictionaries:

```
julia> haskey.(d, [1, 2, 5])
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
julia> haskey.(Ref(d), [1, 2, 5])
3-element BitVector:
1
1
0
```

Now you should have an understanding of iteration, indexing (including array indexing), and broadcasting.

# Quiz: problems

- What is the order of iteration of arrays?
- Are tuples broadcastable?
- Are numbers iterable and indexable?
- Are atomic numbers iterable, indexable, or broadcastable?
- Is
`Set`

iterable, indexable, or broadcastable? - Is
`Dict`

iterable, indexable, or broadcastable? - Is named tuple iterable, indexable, or broadcastable?
- What is special about string iteration, indexing, and broadcasting?
- Is
`Pair`

iterable, indexable, or broadcastable? - Is
`pairs(["a", "b", "c"])`

iterable, indexable, or broadcastable? - Bonus question: is
`DataFrame`

iterable, indexable, or broadcastable? (it is the only type not from Base Julia in the list, but I could not resist the temptation to include it)

# Quiz: answers

*1. What is the order of iteration of arrays?*

With arrays, you need to remember that they are iterated in column major order (just as in linear indexing we discussed):

```
julia> x = [1 2; 3 4]
2×2 Matrix{Int64}:
1 2
3 4
julia> for v in x
print(v, " ")
end
1 3 2 4
```

*2. Are tuples broadcastable?*

Yes, you can use tuples in broadcasting. They are handled in the same way
as vectors. The only difference is that if you only broadcast a tuple you
get a tupe as a result, while, if you mix tuples with arrays of dimension
at least `1`

you get an array:

```
julia> 1 .+ (1, 2, 3)
(2, 3, 4)
julia> fill(1) .+ (1, 2, 3) # fill(1) produces 0-dimensional array
(2, 3, 4)
julia> [1] .+ (1, 2, 3) # here [1] is 1-dimensional array
3-element Vector{Int64}:
2
3
4
```

*3. Are numbers iterable and indexable?*

Yes they are. They are treated as 0-dimensional containers.

```
julia> x = 1
1
julia> for v in x
println(v)
end
1
julia> x[]
1
julia> x[1, 1, 1]
1
julia> size(x)
()
julia> axes(x)
()
```

*4. Are atomic numbers iterable, indexable, or broadcastable?*

No. They only support indexing with no arguments passed:

```
julia> x = Threads.Atomic{Int}(1)
Base.Threads.Atomic{Int64}(1)
julia> for v in x
println(v)
end
ERROR: MethodError: no method matching iterate(::Base.Threads.Atomic{Int64})
julia> x[]
1
julia> x[1]
ERROR: MethodError: no method matching getindex(::Base.Threads.Atomic{Int64}, ::Int64)
julia> x .+ 1
ERROR: MethodError: no method matching length(::Base.Threads.Atomic{Int64})
```

*5. Is Set iterable, indexable, or broadcastable?*

It is iterable and broadcastable, but not indexable:

```
julia> s = Set([1, 2, 3])
Set{Int64} with 3 elements:
2
3
1
julia> for v in s
println(v)
end
2
3
1
julia> s .+ 1
3-element Vector{Int64}:
3
4
2
```

Note that when you iterate or broadcast `Set`

the order of returned values
is undefined. It is a common pitfall when doing broadcasting:

```
julia> s
Set{Int64} with 3 elements:
2
3
1
julia> [1, 2, 3] .∈ s # note the iteration order of s
3-element BitVector:
0
0
0
julia> [1, 2, 3] .∈ Ref(s) # this is what you really wanted
3-element BitVector:
1
1
1
```

*6. Is Dict iterable, indexable, or broadcastable?*

It is iterable, and partially indexable, but not broadcastable:

```
julia> d = Dict(v => '0'+v for v in 1:4)
Dict{Int64, Char} with 4 entries:
4 => '4'
2 => '2'
3 => '3'
1 => '1'
julia> foreach(println, d)
4 => '4'
2 => '2'
3 => '3'
1 => '1'
julia> foreach(show, keys(d))
4231
julia> foreach(show, values(d))
'4''2''3''1'
```

For dictionaries, a basic thing to remember is that iterating them returns
key-value `Pair`

s. If you want only keys, use `keys`

to create an appropriate
iterator. Similarly to get values use the `values`

function. Also the order of
iteration of a standard `Dict`

is undefined.

```
julia> d[1]
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
```

They are partially indexable, since they do not support `firstindex`

and
`lastindex`

methods.

Finally they are not broadcastable:

```
julia> d .+ 1
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
```

*7. Is named tuple iterable, indexable, or broadcastable?*

They are iterable, indexable, but not broadcastable:

```
julia> nt = (a=1, b=2, c=3)
(a = 1, b = 2, c = 3)
julia> foreach(println, nt)
1
2
3
julia> foreach(println, pairs(nt))
:a => 1
:b => 2
:c => 3
julia> nt[end]
3
julia> nt .+ 1
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
```

Note that if you want to get iterate key-value in a named tuple you need to
use the `pairs`

function.

*8. What is special about string iteration, indexing, and broadcasting?*

When you iterate strings you get characters. String indexing is supported, but does not have to be continuous. Instead they support byte-indexing, as I have explained in detail in this post. In broadcasting they behave like a scalar:

```
julia> str = "1₂3"
"1₂3"
julia> foreach(display, str)
'1': ASCII/Unicode U+0031 (category Nd: Number, decimal digit)
'₂': Unicode U+2082 (category No: Number, other)
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)
julia> collect(eachindex(str))
3-element Vector{Int64}:
1
2
5
julia> str[2]
'₂': Unicode U+2082 (category No: Number, other)
julia> str[5]
'3': ASCII/Unicode U+0033 (category Nd: Number, decimal digit)
julia> string.(["a", "b"], str)
2-element Vector{String}:
"a1₂3"
"b1₂3"
```

Notice, that character `'₂'`

has index 2, but the next valid index is 5,
because `'₂'`

occupies three bytes.

*9. Is Pair iterable, indexable, or broadcastable?*

It is iterable and indexable. In broadcasting it is treated as a scalar:

```
julia> p = :a => sin
:a => sin
julia> foreach(display, p)
:a
sin (generic function with 14 methods)
julia> p[1]
:a
julia> p[end]
sin (generic function with 14 methods)
julia> tuple.(p, (1, 2))
((:a => sin, 1), (:a => sin, 2))
```

*10. Is pairs(["a", "b", "c"]) iterable, indexable, or broadcastable?*

It is iterable and partially indexable, but not broadcastable, although it has length, size, and axes:

```
julia> pv = pairs(["a", "b", "c"])
pairs(::Vector{String})(...):
1 => "a"
2 => "b"
3 => "c"
julia> foreach(println, pv)
1 => "a"
2 => "b"
3 => "c"
julia> pv[1]
"a"
julia> pv[3]
"c"
julia> length(pv)
3
julia> size(pv)
(3,)
julia> axes(pv)
(Base.OneTo(3),)
julia> identity.(pv)
ERROR: ArgumentError: broadcasting over dictionaries and `NamedTuple`s is reserved
```

*11. Is DataFrame iterable, indexable, or broadcastable?*

Data frame is indexable (but requires to always pass two dimensional index) and broadcastable, but not iterable. To iterate it you need to choose if you want to iterate rows or columns:

```
julia> using DataFrames
julia> df = DataFrame(a=1:3, b=11:13)
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 11
2 │ 2 12
3 │ 3 13
julia> df[2, 1]
2
julia> df[end, end]
13
julia> df[1]
ERROR: ArgumentError: syntax df[column] is not supported use df[!, column] instead
julia> df .^ 2
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 121
2 │ 4 144
3 │ 9 169
julia> foreach(println, df)
ERROR: AbstractDataFrame is not iterable. Use eachrow(df) to get a row iterator or eachcol(df) to get a column iterator
julia> foreach(println, eachrow(df))
DataFrameRow
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 11
DataFrameRow
Row │ a b
│ Int64 Int64
─────┼──────────────
2 │ 2 12
DataFrameRow
Row │ a b
│ Int64 Int64
─────┼──────────────
3 │ 3 13
julia> foreach(println, eachcol(df))
[1, 2, 3]
[11, 12, 13]
```

# Conclusions

I hope you found the examples I included in the post useful and that after reading it your grasp of working with containers in Julia improved!