Introduction

Today my post is about subtypes of concrete types. It is mostly academic, but I hope it will be useful for readers wanting to get a better understanding of corner cases Julia’s type system.

The post was written under Julia 1.8.0 (with special thanks for all people who contributed to this release!).

What is a concrete type in Julia?

In Julia a type is concrete it can have a direct instance, that is, some type T is concrete if there exists at least one value v such that typeof(v) === T. For every type you can check whether it is concrete using the isconcretetype function.

Today I want to discuss the following sentence from the section on Types from the Julia Manual in relation to concrete types:

One particularly distinctive feature of Julia’s type system is that concrete types may not subtype each other: all concrete types are final and may only have abstract types as their supertypes.

From this sentence some readers conclude that concrete types cannot have subtypes. However, it is not the case. Concrete types in Julia can have subtypes as long as these subtypes are not concrete.

You might ask does this ever happen in practice? The answer is that it happens and here are the examples when it does.

The first one is Union{} type. This type is not concrete and has no values. However, it is a subtype of all types, including concrete types, for example:

julia> Union{} <: Int
true

julia> Union{} <: Vector{Missing}
true

The other case is Type{T} type, where T is a DataType (i.e. if T has type DataType). All common concrete types are subtypes of DataType, e.g. integers or vectors. So types like Int or Vector{Missing} have DataType type:

julia> typeof(Int)
DataType

julia> typeof(Vector{Missing})
DataType

which means that DataType must be concrete, and indeed it is:

julia> isconcretetype(DataType)
true

Although DataType is concrete, it has Type{Int} and Type{Vector{Missing}} as its subtypes (and these types must not, and are not, concrete as we discussed above):

julia> Type{Int} <: DataType
true

julia> Type{Vector{Missing}} <: DataType
true

julia> isconcretetype(Type{Int})
false

julia> isconcretetype(Type{Vector{Int}})
false

Why these subtyping considerations matter?

The most important lesson learned here is that in your code you should not assume that concrete type cannot have subtypes, as it can (although these subtypes cannot be concrete themselves). This observation is mostly relevant for package developers, who need to write generic code.

However, there are some practical situations when one can be affected by these subtyping rules. The most common is when one is working with missing values.

Assume that I generate some random matrix containing either 1 or missing:

julia> using Random

julia> Random.seed!(1234);

julia> mat = rand([1, missing], 10, 3)
10×3 Matrix{Union{Missing, Int64}}:
 1          missing  1
  missing   missing  1
 1         1          missing
  missing  1         1
 1         1          missing
 1          missing  1
  missing   missing  1
  missing   missing   missing
 1          missing   missing
  missing   missing   missing

Now, I want to compute the sums of its rows, while skipping missing values. Here is how you can do it:

julia> [sum(skipmissing(row)) for row in eachrow(mat)]
10-element Vector{Int64}:
 2
 1
 2
 2
 2
 2
 1
 0
 1
 0

However, a very similar codes that follow do not work:

julia> [sum(skipmissing(identity.(row))) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.

julia> [sum(skipmissing([x for x in row])) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.

What is the reason for the difference? In the skipmissing(row) case row is a view and retains information about element type of the whole array, which is Union{Missing, Int64}, so it is able to properly compute sum even in the case when all values in a row are missing.

On the other hand both identity.(row) and [x for x in row] materialize the row and perform type narrowing. This type narrowing means that in rows that only contain missing values the information about Int64 is lost and we get an error. Let us see it step by step:

julia> row = last(eachrow(mat))
3-element view(::Matrix{Union{Missing, Int64}}, 10, :) with eltype Union{Missing, Int64}:
 missing
 missing
 missing

julia> x = identity.(row)
3-element Vector{Missing}:
 missing
 missing
 missing

julia> eltype(skipmissing(x))
Union{}

As you can see, since skipmissing strips the Missing part from the source vector element type, we are left with Union{}.

Unfortunately, such errors happen from time to time when one works with data having missing values. For such cases in Julia many (but not all) common reduction functions support the init keyword, so you can do:

julia> [sum(skipmissing(identity.(row)), init=0) for row in eachrow(mat)]
10-element Vector{Int64}:
 2
 1
 2
 2
 2
 2
 1
 0
 1
 0

and all is good even if type inference produces Union{}.

Conclusions

The post today was less practical than usual. However, I hope you will find it useful when Julia tries to take you into a deep dark type system forest where 2+2=5, and the path leading out is only wide enough for one (Mikhail Tal).