Introduction

Today I wanted to discuss a conceptual aspect of Julia programming. It is related to the question how you should query some object for its properties. The topic is especially relevant if you want to write code that is expected to be stable in the longer term, that means that it is easy to maintain as versions of its dependencies change.

The post was written under Julia 1.10.0 and DataFrames.jl 1.6.1.

The internals

A fundamental element of Julia design are composite types. This kind of object is a collection of fields, that have names. Each of such fields can hold some value.

To make things non-abstract let us have a look at a SubDataFrame type from DataFrames.jl. First create an instance of such object:

julia> using DataFrames

julia> df = DataFrame(x=1:3, y=11:13, z=111:113)
3×3 DataFrame
 Row │ x      y      z
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1     11    111
   2 │     2     12    112
   3 │     3     13    113

julia> sdf = @view df[1:2, 1:2]
2×2 SubDataFrame
 Row │ x      y
     │ Int64  Int64
─────┼──────────────
   1 │     1     11
   2 │     2     12

To check what fields SubDataFrame contains you can use the the fieldnames function:

julia> fieldnames(SubDataFrame)
(:parent, :colindex, :rows)

Note that we pass a type to fieldnames. It is important - the list of fields is fixed for every instance of an object of a given type.

In this case we learned that SubDataFrame has three fields. The three functions associated with fieldnames are: fieldcount returning the number of fields of a type, fieldtypes returning their declared types, and hasfield allowing you to query if a specific field is present. There is an example:

julia> fieldcount(SubDataFrame)
3

julia> fieldtypes(SubDataFrame)
(AbstractDataFrame, DataFrames.AbstractIndex, AbstractVector{Int64})

julia> hasfield(SubDataFrame, :parent)
true

julia> hasfield(SubDataFrame, :parentx)
false

For a given instance of a type you can query the field with getfield and set it with setfield!. For example, let us get the field :parent of our sdf object (a source data frame in this case):

julia> getfield(sdf, :parent)
3×3 DataFrame
 Row │ x      y      z
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1     11    111
   2 │     2     12    112
   3 │     3     13    113

Having learned all these methods you might ask yourself when to use it. The short answer is:

Never directly access fields of a type. They might be changed between versions of code you use without warning.

The longer answer is that you should assume that direct field access is typically considered internal. The list and fields and their types are an implementation detail and as a user of this type you should not rely on them. The use of property access is restricted to the designers of a type to allow them manipulate its inner physical representation.

So how should we work with composite types then?

The composite type interface

Julia introduces a concept of property that is a logical representation of data stored in a given object. You can query for properties of an object with the propertynames function. You also have the hasproperty, getproperty and setproperty! functions similar as for fields.

In case of our sdf SubDataFrame we have the following logical representation:

julia> propertynames(sdf)
2-element Vector{Symbol}:
 :x
 :y

julia> hasproperty(sdf, :x)
true

julia> getproperty(sdf, :x)
2-element view(::Vector{Int64}, 1:2) with eltype Int64:
 1
 2

julia> setproperty!(sdf, :x, [1001, 1002])
2-element Vector{Int64}:
 1001
 1002

julia> sdf
2×2 SubDataFrame
 Row │ x      y
     │ Int64  Int64
─────┼──────────────
   1 │  1001     11
   2 │  1002     12

We immediately see a significant difference. The sdf properties in this case are columns of our data frame. We do not care how they are mapped to a physical representation of SubDataFrame, this is taken care of by designers of the DataFrames.jl package.

There are the following important aspects of properties.

The first is that property access is typically considered a public API. Designers of the type should make sure that the way you can access properties of an object should remain stable and a change in this area would be breaking, so:

You should access properties of objects in your code (not fields).

The second is that properties are bound to object, not to a type. This means that different objects of the same type may have different sets of properties. It is quite useful, e.g. each data frame can have a different set of columns.

The third, practical, information is that by default properties fall back to fields, as you can read here in the Julia Manual.

The next aspect is convenient syntax. You do not need to call the getproperty and setproperty! functions explicitly. The getproperty(a, :b) is equivalent to a.b, and setproperty!(a, :b, v) is the same as a.b = v.

Finally note that the propertynames function optionally takes a second positional argument that is Bool. If it is passed and set to true you get a list of all properties of some object. By default the second argument is false and you get a list of public properties of some object (and in practice you should use the default mode).

Conclusions

Today I have a short conclusion.

Fields represent physical layout of a type. Properties represent a logical view of an object.

In your code use object properties and not their fields. Field access is considered internal and typically should be only done by developers of a package providing a given object.