During JuliaCon 2021 I got some requests to show some more basic functionalities of DataFrames.jl. I started writing such a post. It seems it ended up not so basic, but I hope it is useful.

What we will want to do is to create a plot of PLN/USD buy and sell exchange rates for the year 2020. The key challenge is getting the data. I fetch it from a Web API provided by NBP (Narodowy Bank Polski).

This post was tested under Julia 1.6.1, HTTP.jl 0.9.12, JSON3.jl 1.9.0, Chain.jl 0.4.7, DataFrames.jl 1.2.1, and Plots.jl 1.19.3.

Getting ready

Before writing our solution let us play a bit with the NBP Web API.

Here is a sample query that you can submit to get information about bid and ask rates of PLN/USD on 2020-01-03:

(you can copy-paste it into your browser to see the rendered result)

We have to be careful though as sometimes the prices are not available, and we get the dreaded 404 error. It happens e.g. for 2020-01-01:

Before we start let us learn how to handle such queries in Julia.

The first step is getting the query result as a string:

julia> using HTTP

julia> q = ""

julia> r = HTTP.get(q)
HTTP/1.1 200 OK
Date: Fri, 23 Jul 2021 17:04:48 GMT
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 147
Content-Type: application/json; charset=utf-8
Expires: -1
ETag: "ahlzfBWI/hBaFs4mrcmnRvKkGdL1gH6hS2LIHIfppJI="
Set-Cookie: ee3la5eizeiY4Eix=ud5ahSho; path=/

{"table":"C","currency":"dolar amerykański","code":"USD","rates":[{"no":"002/C/NBP/2020","effectiveDate":"2020-01-03","bid":3.7667,"ask":3.8427}]}"""

julia> s = String(r.body)
"{\"table\":\"C\",\"currency\":\"dolar amerykański\",\"code\":\"USD\",\"rates\":[{\"no\":\"002/C/NBP/2020\",\"effectiveDate\":\"2020-01-03\",\"bid\":3.7667,\"ask\":3.8427}]}"

We can see that the result of the request is a JSON file, so let us parse it and inspect a bit:

julia> using JSON3

julia> j =
JSON3.Object{Base.CodeUnits{UInt8, String}, Vector{UInt64}} with 4 entries:
  :table    => "C"
  :currency => "dolar amerykański"
  :code     => "USD"
  :rates    => JSON3.Object[{…

julia> j.code

julia> j.rates
1-element JSON3.Array{JSON3.Object, Base.CodeUnits{UInt8, String}, SubArray{UInt64, 1, Vector{UInt64}, Tuple{UnitRange{Int64}}, true}}:
              "no": "002/C/NBP/2020",
   "effectiveDate": "2020-01-03",
             "bid": 3.7667,
             "ask": 3.8427

julia> j.rates[1].effectiveDate

julia> j.rates[1].bid

julia> j.rates[1].ask

Now we know enough about the structure of the response to write a full solution.

Before moving forward let us just check what happens if we try to fetch data in a day for which there are no quotations:

julia> q = ""

julia> r = HTTP.get(q)
ERROR: HTTP.ExceptionRequest.StatusError(404, "GET", "/api/exchangerates/rates/c/usd/2020-01-01/?format=json", HTTP.Messages.Response:
HTTP/1.1 404 Not Found
Date: Fri, 23 Jul 2021 17:09:07 GMT
Cache-Control: no-cache
Pragma: no-cache
Content-Length: 38
Content-Type: text/plain; charset=utf-8
Expires: -1
Set-Cookie: ee3la5eizeiY4Eix=Naew5Ohp; path=/

404 NotFound - Not Found - Brak danych""")

We can see that in this case we get a 404 error thrown as HTTP.ExceptionRequest.StatusError exception.

We are ready to write the solution we wanted.

Julia in action

First define a function that returns a NamedTuple with the date and bid and ask rates. If the rates are not available it should return them as missing.

Here it goes:

julia> using Chain

julia> using Dates

julia> function get_rate(date::Date)
           q = "$date/?format=json"

               return @chain q begin
                   @aside @assert _.code == "USD"
                   @aside @assert date == Date(_.effectiveDate)
                   (date=date,, ask=_.ask)
           catch e
               if e isa HTTP.ExceptionRequest.StatusError
                   return (date=date, bid=missing, ask=missing)
get_rate (generic function with 1 method)

Let us note a few things:

  1. In order to streamline processing I used @chain macro from the Chain.jl package (a lot of people are asking for more examples how it can be used in practice).
  2. In particular I used the @aside statement twice to perform validation of the result. Note that this is a useful way to inject into a chain of commands an operation whose result should not be stored but it is still useful to be performed.
  3. I had to use try-catch block to handle the 404 error. Note though, that if for some strange reason we got an exception of type other than HTTP.ExceptionRequest.StatusError I rethrow it to avoid covering such error (and many bad things can happen in the wild e.g. we could have run out of memory).

Let us test our function before moving forward:

julia> get_rate(Date("2020-01-03"))
(date = Date("2020-01-03"), bid = 3.7667, ask = 3.8427)

julia> get_rate(Date("2020-01-01"))
(date = Date("2020-01-01"), bid = missing, ask = missing)

I am sure everyone is waiting now to get to working with DataFrames.jl. Here it goes, we collect the data from the whole year to a DataFrame using the push! function:

julia> using DataFrames

julia> usd_pln = DataFrame()
0×0 DataFrame

julia> for date in Date("2020-01-01"):Day(1):Date("2020-12-31")
           push!(usd_pln, get_rate(date), promote=true)

julia> usd_pln
366×3 DataFrame
 Row │ date        bid           ask
     │ Date        Float64?      Float64?
   1 │ 2020-01-01  missing       missing
   2 │ 2020-01-02        3.7597        3.8357
   3 │ 2020-01-03        3.7667        3.8427
   4 │ 2020-01-04  missing       missing
   5 │ 2020-01-05  missing       missing
   6 │ 2020-01-06  missing       missing
   7 │ 2020-01-07        3.7679        3.8441
  ⋮  │     ⋮            ⋮             ⋮
 360 │ 2020-12-25  missing       missing
 361 │ 2020-12-26  missing       missing
 362 │ 2020-12-27  missing       missing
 363 │ 2020-12-28        3.6611        3.7351
 364 │ 2020-12-29  missing       missing
 365 │ 2020-12-30        3.6833        3.7577
 366 │ 2020-12-31        3.6714        3.7456
                              352 rows omitted

For me all worked cleanly. Note that I am using the promote=true keyword argument, as some of the rows of :bid and :ask columns contain missing values while other contain Float64 values.

We are ready to produce the plot we wanted:

julia> using Plots
julia> plot(, [usd_pln.ask], label=["ask" "bid"],
            xlabel="date", ylabel="PLN/USD", xrotation=45)

And here is the plot:

PLN/USD bid and ask rates

Now feel free to perform any analyses that you find interesting on this data.

From my side let me just show you how to compute the fraction of days in which the bid data is missing by day of the week.

julia> using Statistics

julia> @chain usd_pln begin
           transform(:date => ByRow(dayofweek))
           combine(:bid => (x -> ismissing.(x) |> mean) => :fraction_missing)
7×2 DataFrame
 Row │ date_dayofweek  fraction_missing
     │ Int64           Float64
   1 │              1         0.0769231
   2 │              2         0.0769231
   3 │              3         0.113208
   4 │              4         0.0754717
   5 │              5         0.0961538
   6 │              6         1.0
   7 │              7         1.0

As you probably might have expected on Saturdays and on Sundays we never have any information on bid and ask rates.


I hope you enjoyed this post. I have chosen the topic because, at least in my experience, these days fetching JSON data from the Internet is bread and butter of virtually every data scientist, so I think it is well worth learning how to do it.