My understanding of object property access in Julia
Introduction
Today I wanted to discuss a conceptual aspect of Julia programming. It is related to the question how you should query some object for its properties. The topic is especially relevant if you want to write code that is expected to be stable in the longer term, that means that it is easy to maintain as versions of its dependencies change.
The post was written under Julia 1.10.0 and DataFrames.jl 1.6.1.
The internals
A fundamental element of Julia design are composite types. This kind of object is a collection of fields, that have names. Each of such fields can hold some value.
To make things non-abstract let us have a look at a SubDataFrame
type from DataFrames.jl.
First create an instance of such object:
julia> using DataFrames
julia> df = DataFrame(x=1:3, y=11:13, z=111:113)
3×3 DataFrame
Row │ x y z
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 11 111
2 │ 2 12 112
3 │ 3 13 113
julia> sdf = @view df[1:2, 1:2]
2×2 SubDataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 11
2 │ 2 12
To check what fields SubDataFrame
contains you can use the the fieldnames
function:
julia> fieldnames(SubDataFrame)
(:parent, :colindex, :rows)
Note that we pass a type to fieldnames
. It is important - the list of fields is fixed for every
instance of an object of a given type.
In this case we learned that SubDataFrame
has three fields. The three functions associated with
fieldnames
are: fieldcount
returning the number of fields of a type,
fieldtypes
returning their declared types, and hasfield
allowing you
to query if a specific field is present. There is an example:
julia> fieldcount(SubDataFrame)
3
julia> fieldtypes(SubDataFrame)
(AbstractDataFrame, DataFrames.AbstractIndex, AbstractVector{Int64})
julia> hasfield(SubDataFrame, :parent)
true
julia> hasfield(SubDataFrame, :parentx)
false
For a given instance of a type you can query the field with getfield
and set it with setfield!
.
For example, let us get the field :parent
of our sdf
object (a source data frame in this case):
julia> getfield(sdf, :parent)
3×3 DataFrame
Row │ x y z
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 11 111
2 │ 2 12 112
3 │ 3 13 113
Having learned all these methods you might ask yourself when to use it. The short answer is:
Never directly access fields of a type. They might be changed between versions of code you use without warning.
The longer answer is that you should assume that direct field access is typically considered internal. The list and fields and their types are an implementation detail and as a user of this type you should not rely on them. The use of property access is restricted to the designers of a type to allow them manipulate its inner physical representation.
So how should we work with composite types then?
The composite type interface
Julia introduces a concept of property that is a logical representation of data stored in a given object.
You can query for properties of an object with the propertynames
function. You also have the hasproperty
,
getproperty
and setproperty!
functions similar as for fields.
In case of our sdf
SubDataFrame
we have the following logical representation:
julia> propertynames(sdf)
2-element Vector{Symbol}:
:x
:y
julia> hasproperty(sdf, :x)
true
julia> getproperty(sdf, :x)
2-element view(::Vector{Int64}, 1:2) with eltype Int64:
1
2
julia> setproperty!(sdf, :x, [1001, 1002])
2-element Vector{Int64}:
1001
1002
julia> sdf
2×2 SubDataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1001 11
2 │ 1002 12
We immediately see a significant difference. The sdf
properties in this case are columns of our data frame.
We do not care how they are mapped to a physical representation of SubDataFrame
, this is taken care of
by designers of the DataFrames.jl package.
There are the following important aspects of properties.
The first is that property access is typically considered a public API. Designers of the type should make sure that the way you can access properties of an object should remain stable and a change in this area would be breaking, so:
You should access properties of objects in your code (not fields).
The second is that properties are bound to object, not to a type. This means that different objects of the same type may have different sets of properties. It is quite useful, e.g. each data frame can have a different set of columns.
The third, practical, information is that by default properties fall back to fields, as you can read here in the Julia Manual.
The next aspect is convenient syntax.
You do not need to call the getproperty
and setproperty!
functions explicitly.
The getproperty(a, :b)
is equivalent to a.b
, and setproperty!(a, :b, v)
is the same as a.b = v
.
Finally note that the propertynames
function optionally takes a second positional argument
that is Bool
. If it is passed and set to true
you get a list of all properties of some object.
By default the second argument is false
and you get a list of public properties of some object
(and in practice you should use the default mode).
Conclusions
Today I have a short conclusion.
Fields represent physical layout of a type. Properties represent a logical view of an object.
In your code use object properties and not their fields. Field access is considered internal and typically should be only done by developers of a package providing a given object.