Construction vs conversion in Julia
Introduction
In the recent 0.22.6 release of the DataFrames.jl package we have
deprecated a bunch of convert
methods for objects defined in it.
In this post I want to comment on when convert
is used and when it is
appropriate to define a convert
method for custom types.
The history
DataFrames.jl is an old package, with its 0.0.0 release made on Feb 14, 2013.
During this time the Julia language has evolved. Even in Julia 0.6 convert
method was a default fallback for construction as you can see here:
However, in some cases you could consider adding methods to
Base.convert
instead of defining a constructor, because Julia falls back to callingconvert()
if no matching constructor is found. For example, if no constructorT(args...) = ...
existsBase.convert(::Type{T}, args...) = ...
is called.
This is a past of pre-1.0 Julia design (if you are interested in some more
history you can start with this issue). The things have changed now,
but until DataFrames.jl 0.22.6 release we had some convert methods that were
inspired by this rule, and we even had constructors that fell back to convert
explicitly.
So what is the current rule for using convert
?
The present
Fortunately, as of Julia 1.6 the manual is quite clear that
convert
should be only defined in cases when it is safe to perform a
conversion even when the user does not ask for it explicitly. Before we move
forward let us see it in action:
julia> struct Example
a::Int
b::Complex{Int}
end
julia> Example(1.0, 1.0)
Example(1, 1 + 0im)
You can see that 1.0
(of type Float64
) got implicitly converted to 1
(of
type Int
). Similarly 1.0
got converted to 1 + 0im
(Complex{Int}
).
These conversions are performed implicitly to ensure programmer convenience.
However, the following fails:
julia> example(a::Int, b::Complex{Int}) = (a, b)
example (generic function with 1 method)
julia> example(1.0, 1.0)
ERROR: MethodError: no method matching example(::Float64, ::Float64)
So when does the implicit conversion happen? Again the manual explains it here:
- Assigning to an array converts to the array’s element type.
- Assigning to a field of an object converts to the declared type of the field.
- Constructing an object with
new
converts to the object’s declared field types.- Assigning to a variable with a declared type (e.g.
local x::T
) converts to that type.- A function with a declared return type converts its return value to that type.
- Passing a value to
ccall
converts it to the corresponding argument type.
So a short, simplifying, rule is that the conversion happens when you want to make an assignment to a value that has a pre-specified type.
Let us see it at work:
julia> d1 = Set{Int}()
Set{Int64}()
julia> push!(d1, 1.0)
Set{Int64} with 1 element:
1
julia> d2 = BitSet()
BitSet([])
julia> push!(d2, 1.0)
ERROR: MethodError: no method matching push!(::BitSet, ::Float64)
You can push 1.0
to Set{Int}
, because its push!
method makes no
restriction on the type of value that is pushed to the Set{Int}
. Then,
internally, Set{Int}
converts the passed value to Int
before storing it. You
can make sure that this happens by writing:
julia> push!(d1, "a")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
Closest candidates are:
convert(::Type{T}, ::Ptr) where T<:Integer at pointer.jl:23
convert(::Type{T}, ::T) where T<:Number at number.jl:6
convert(::Type{T}, ::Number) where T<:Number at number.jl:7
...
Stacktrace:
[1] setindex!(h::Dict{Int64, Nothing}, v0::Nothing, key0::String)
@ Base ./dict.jl:374
[2] push!(s::Set{Int64}, x::String)
@ Base ./set.jl:57
[3] top-level scope
@ REPL[32]:1
and if you check the setindex!
implementation you will see the conversion:
function setindex!(h::Dict{K,V}, v0, key0) where V where K
key = convert(K, key0)
if !isequal(key, key0)
throw(ArgumentError("$(limitrepr(key0)) is not a valid key for type $K"))
end
setindex!(h, v0, key)
end
So why you are not allowed to push!
a float to a BitSet
? The reason is that
its push!
method is defined more restrictively like this:
@inline push!(s::BitSet, n::Integer) = _setint!(s, _check_bitset_bounds(n), true)
and since no conversion happens when passing parameters to functions an error is thrown.
Conclusion
To wrap up let us go back to the manual here:
since
convert
can be called implicitly, its methods are restricted to cases that are considered “safe” or “unsurprising”.convert
will only convert between types that represent the same basic kind of thing (e.g. different representations of numbers, or different string encodings). It is also usually lossless; converting a value to a different type and back again should result in the exact same value.
In essence this means that, unless you are developing a low level infrastructure
package (like defining new numeric type), you are not likely to need to define
convert
methods at all. Just define proper constructors for your types.
In DataFrames.jl we left only three conversions.
The first is from SubDataFrame
to DataFrame
. It is clearly a lose-less
conversion that sometimes might be useful (e.g. you have a container of
DataFrame
objects and want to be able to implicitly add SubDataFrame
to it).
The second and third conversions are from DataFrameRow
and GroupKey
to
NamedTuple
. The reason behind allowing them is that we try to make both these
types to work like NamedTuple
(except for the fact that they are views).
Again in this case the conversion is lose-less.
As a final note — I think Julia 1.6 manual is really a great resource (half-way of writing this post I considered to stop it as the manual is so clear about the rules).
Edit
After publishing this post I had a discussion on how what I describe here compares to C#, which distinguishes implicit and explicit conversions, see here. Without going into all the details (they are explained in the referenced documentation) in practice the most important rule in C# is:
Predefined C# implicit conversions always succeed and never throw an exception. User-defined implicit conversions should behave in that way as well. If a custom conversion can throw an exception or lose information, define it as an explicit conversion.
which is not the case with convert
function in Julia. On the other hand
implicit conversions also occur in method invocations in C# (again - this is not
the case in Julia).
We have discussed the method invocation case above, so let me give examples of
convert
use in Julia that would not be allowed in C# implicit conversions.
In the first one the conversion throws an error.
julia> convert(Bool, 1) # we see that the conversion method is defined
true
julia> convert(Bool, 2) # but it can throw an error
ERROR: InexactError: Bool(2)
In the second one the conversion looses information.
julia> x = eps()
2.220446049250313e-16
julia> y = convert(Float16, x) # we lost information due to rounding
Float16(0.0)
So, to avoid the ambiguity of meaning against the understanding of implicit vs
explicit conversion in other programming languages, I think it is simplest to
have the following mental model: in Julia users are allowed to define convert
methods for their types and there is a precise list of situations (given above
in my post) when the convert
function is invoked Julia Base.