The power of lookup functions in Julia
Introduction
Julia provides four very useful general look-up functions in Base:
argmin
, argmax
, findmin
, and findmax
.
Normally they are used with arrays, like this:
julia> a = [2, 1, 5, 0]
4-element Array{Int64,1}:
2
1
5
0
julia> findmax(a)
(5, 3)
julia> argmax(a)
3
julia> findmin(a)
(0, 4)
julia> argmin(a)
4
However, the beauty of their design is that they work with arbitrary collections of values that support comparisons. Below I give some examples.
This post was written using Julia 1.5.2, StatsBase.jl v0.32.2 and DataFrames.jl 0.21.8.
Dictionaries
A quite useful pattern for use of these functions is with dictionaries. Here is an example:
julia> d = Dict('a':'d' .=> a)
Dict{Char,Int64} with 4 entries:
'a' => 2
'c' => 5
'd' => 0
'b' => 1
julia> findmax(d)
(5, 'c')
julia> argmax(d)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> findmin(d)
(0, 'd')
julia> argmin(d)
'd': ASCII/Unicode U+0064 (category Ll: Letter, lowercase)
Now you might ask when this is useful? Consider that we have some set of nominal values and want to find the most frequent one. Here is an easy way to do it using StatsBase.jl:
julia> using Random, StatsBase
julia> Random.seed!(1234);
julia> r = rand(1:10, 1000);
julia> m = countmap(r)
Dict{Int64,Int64} with 10 entries:
7 => 100
4 => 91
9 => 81
10 => 110
2 => 107
3 => 99
5 => 103
8 => 107
6 => 107
1 => 95
julia> findmax(m) # find the frequency and value of the most frequent item
(110, 10)
julia> findall(==(findmax(m)[1]), m) # make sure it is unique
1-element Array{Int64,1}:
10
Data frames
When working with data frames we often store data that can be lexicographically compared. Let us consider the following simple data frame:
julia> using DataFrames
julia> Random.seed!(1234);
julia> df = DataFrame(c=rand('a':'d', 10), n=rand(10))
10×2 DataFrame
│ Row │ c │ n │
│ │ Char │ Float64 │
├─────┼──────┼──────────┤
│ 1 │ 'a' │ 0.372846 │
│ 2 │ 'b' │ 0.263121 │
│ 3 │ 'c' │ 0.98869 │
│ 4 │ 'a' │ 0.489858 │
│ 5 │ 'd' │ 0.425211 │
│ 6 │ 'a' │ 0.379765 │
│ 7 │ 'd' │ 0.286955 │
│ 8 │ 'c' │ 0.118085 │
│ 9 │ 'd' │ 0.739658 │
│ 10 │ 'c' │ 0.97138 │
Assume, as stated above, that we want to find its largest row lexicographically:
julia> findmax(eachrow(df))
(DataFrameRow
│ Row │ c │ n │
│ │ Char │ Float64 │
├─────┼──────┼──────────┤
│ 9 │ 'd' │ 0.739658 │, 9)
Also because eachrow(df)
returns an AbstractArray
object we can also safely
use maximum
function to get the following:
julia> maximum(eachrow(df))
DataFrameRow
│ Row │ c │ n │
│ │ Char │ Float64 │
├─────┼──────┼──────────┤
│ 9 │ 'd' │ 0.739658 │
All this is nice and clean in a fully generic way.
Conclusions
This time conclusions will be a warning. One should clearly understand how these functions work, as in some cases their behavior might be surprising. Here is an example:
julia> d = Dict('a':'d' .=> 4:-1:1)
Dict{Char,Int64} with 4 entries:
'a' => 4
'c' => 2
'd' => 1
'b' => 3
julia> maximum(d)
'd' => 1
julia> findmax(d)
(4, 'a')
And we see that maximum
finds a largest key-value pair while findmax
locates the largest value and returns a tuple containing this value
and a key corresponding to it.
Note though that e.g. for NamedTuple
the behavior is different:
julia> nt = (a=4, b=3, c=2, d=1)
(a = 4, b = 3, c = 2, d = 1)
julia> maximum(nt)
4
julia> findmax(nt)
(4, :a)
and in this case we consistently get the largest value in both cases.