Annotating columns of a data frame with DataFramesMeta.jl
Introduction
Today I want to discuss a functionality that was recently added to DataFramesMeta.jl. These utility macros and functions make it easy to add custom labels and notes to columns of a data frame. This functionality is especially useful when working with wide data frames, as is often the case when e.g. analyzing economic data.
This post is written under Julia 1.10.1, DataFrames.jl 1.6.1, and DataFramesMeta.jl 0.15.2.
Column labels
A column label is a short description of the contents of a column. When using DataFramesMeta.jl you can use the following basic commands to work with them:
@label!
attaches a label to a column;label
allows you to retrieve column label;printlabels
presents you labels of all annotated columns in a data frame.
Here is a simple example:
julia> using DataFramesMeta
julia> df = DataFrame(year=[2000, 2001], rev=[12, 17])
2×2 DataFrame
Row │ year rev
│ Int64 Int64
─────┼──────────────
1 │ 2000 12
2 │ 2001 17
julia> @label!(df, :rev = "Revenue (USD)")
2×2 DataFrame
Row │ year rev
│ Int64 Int64
─────┼──────────────
1 │ 2000 12
2 │ 2001 17
julia> label(df, :rev)
"Revenue (USD)"
julia> printlabels(df)
┌────────┬───────────────┐
│ Column │ Label │
├────────┼───────────────┤
│ year │ year │
│ rev │ Revenue (USD) │
└────────┴───────────────┘
Note that if some column did not get an explicit label (like :year
in our example)
by default its name is its label.
Column notes
Column notes are meant to give more detailed information about a column in a data frame. You can use the following basic commands to work with them:
@note!
attaches a note to a column;note
allows you to retrieve column note;printnotes
presents you notes of all columns in a data frame.
julia> @note!(df, :rev = "Total revenue of a company in in a calendar year in nominal USD")
2×2 DataFrame
Row │ year rev
│ Int64 Int64
─────┼──────────────
1 │ 2000 12
2 │ 2001 17
julia> note(df, :rev)
"Total revenue of a company in in a calendar year in nominal USD"
julia> printnotes(df)
Column: rev
───────────
Total revenue of a company in in a calendar year in nominal USD
julia> @note!(df, :year = "Calendar year")
2×2 DataFrame
Row │ year rev
│ Int64 Int64
─────┼──────────────
1 │ 2000 12
2 │ 2001 17
julia> printnotes(df)
Column: year
────────────
Calendar year
Column: rev
───────────
Total revenue of a company in in a calendar year in nominal USD
Observe that printnotes
only prints notes that were actually added to
a column (as opposed to printlabels
which prints labels of all columns,
using the default fallback to column name).
Conclusions
Today I covered the basic functions allowing to work with column metadata of data frames. If you are interested in learning more advanced functionalities please refer to DataFrames.jl and TableMetadataTools.jl documentations.
I hope that you will find the metadata functionality provided by DataFramesMeta.jl useful in your work.