Tutorials for DataFrames.jl release 0.21. Part I
DataFrames.jl release 0.21
DataFrames.jl version 0.21 was a major release that introduced a number of significant changes to DataFrames.jl API. The list is long, so I briefly summarize here he most significant things in terms of functionality:
- we now allow to select columns using strings (selection using
Symbols is still allowed);
- a completely new design of API for working with columns of a data frame or
grouped data frame, covered by
combinefunctions; it is consistent (so you learn it once and reuse everywhere), more flexible, and has a better performance than the old one; in particular two wrappers
AsTablehave been added to API;
- major enhancements to
append!, which allow an easy way to digest heterogeneous data (varying element types, varying column sets) into a data frame;
GroupedDataFramenow supports a fast lookup by grouping columns (so making a
GroupedDataFramecan be now seen as adding an index to a data frame)
filter!are now fast using
- rules for pseudo-broadcasting (spreading single observations across multiple rows) have been established and are consistently applied in all methods that allow this operation.
All these changes combined mean that now all operations on data frames can be
expressed via function chaining (and you have a full control if you want to
make copies or perform operations in-place). There are many users who like this
style of expressing transformations made on data. If you want to go this way,
then probably you should consider learning one of the packages that makes
it easier to work with
|> operator. There are many excellent alternatives
in the Julia ecosystem. Let me mention two Pipe.jl (easier) and
Underscores.jl (more powerful, but harder to master).
After the release I got several questions about showing how things work in practice. Therefore in this post I list tutorials that are currently available and have been updated to show how DataFrames.jl v0.21 works.
In the Part II post (that I plan to prepare next week) I will show some new material that was prepared under DataFrames.jl v0.21.
Tutorials for release 0.21
There are four sources of information about the functionality of DataFrames.jl 0.21 that you can check out (and I maintain them so that they should be up to date):
- An official DataFrames.jl Manual.
- A notebook-based DataFrames.jl Tutorial.
- Video materials at JuliaAcademy.
- Recently updated the materials about DataFrames.jl that I have presented during JuliaCon2019 workshop. You will be able to find there two notebooks that include worked examples how you can process real-life data sets.
I hope these materials will be useful for exploring the latest release of DataFrames.jl!