This week I have read a post on Why Vim is better than VSCode. In it the author discusses a lot the operator - text object - motion pattern in Vim. The post argues that it is not only efficient but fun to learn and use.
It reminded me of the structure of the operation specification language we have in DataFrames.jl that follows the pattern:
input columns => transformation => output column names
The post is written under Julia 1.7.2 and DataFrames.jl 1.3.4.
The user has some data frame and wants to drop a
:col column from it,
but the user is not sure if this column is present in the data frame.
Let us first create two test data frames on which we will test our solutions:
julia> using DataFrames julia> df1 = DataFrame(a=1:2, b=3:4) 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4 julia> df2 = DataFrame(a=1:2, col=["drop", "me"], b=3:4) 2×3 DataFrame Row │ a col b │ Int64 String Int64 ─────┼────────────────────── 1 │ 1 drop 3 2 │ 2 me 4
A basic approach
A natural thing to try is using the
Not selector for this task. Let us
julia> select(df1, Not(:col)) ERROR: ArgumentError: column name :col not found in the data frame julia> select(df2, Not(:col)) 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4
The operation worked on
df2, but failed on
You might ask why
Not selector is so restrictive? The reason is to avoid bugs.
You could accidentally mistype column name and then, if such operation worked,
instead of erroring, your incorrect result would propagate.
An intermediate solution
A first solution that comes to mind is to drop the column only if it is present in a data frame so you might write something like this:
julia> select(df1, names(df1) .!= "col") 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4 julia> select(df2, names(df2) .!= "col") 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4
This works, but you need to write the name of the source data frame twice, so the solution feels a bit heavy.
The fun part
What is the way I find nice to do this operation then? Here is the approach:
julia> select(df1, Cols(!=("col"))) 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4 julia> select(df2, Cols(!=("col"))) 2×2 DataFrame Row │ a b │ Int64 Int64 ─────┼────────────── 1 │ 1 3 2 │ 2 4
We are using a combo of a bit advanced features here.
!=("col") creates a function that compares its argument to
!=. This is a very nice feature of Base Julia that it allows partial function
application for the
Cols function accepts a predicate, in our case
!=("col"). Then it
selects all columns of a data frame for which this predicate returns
The beauty of Julia is that it not only does the job you want done, but also is
quite fun to code with. At the same time, its design often helps you with
catching common possible bugs in code (like the
Not behavior I have described
in this post).