# Introduction

My Julia for Data Analysis book will be soon published (now all its chapters are already available in preview for free).

An important part of the book is its GitHub repository containing all the codes used in the book and ensuring their reproducibility.

Since the book was prepared to fit one semester course on data analysis using Julia I am now preparing supporting teaching materials that accompany it.

Today together with Daniel Kaszyński we have released first part of these supporting materials. In the exercises folder of the book’s GitHub repository we have added 130 exercises that should help you master the material covered in the book.

The exercises are grouped by book chapter. There are 10 exercises for each chapter. Each exercise has a proposed solution. We have prepared the exercises so that they have a varying difficulty level. The exercises from initial chapters should be relatively easy. However, to solve exercises from the final chapters you might need to have a significant knowledge of Julia’s ecosystem for data analysis.

In the post I use Julia 1.8.2, and DataFrames.jl 1.4.1.

# A sample exercise

To have some concrete example of what a typical exercise is I have picked a question that was asked today on Discourse that I liked. The problem is stated as follows.

Consider the following data frame:

julia> using DataFrames

city=["Olecko", "Ełk", "Toronto", "Mississauga"])
4×2 DataFrame
Row │ country  city
│ String   String
─────┼──────────────────────
1 │ Poland   Olecko
2 │ Poland   Ełk


The task is to reduce it by unique value in country column. More specifically we want to create a new data frame with two columns. One of them should be country that will store unique values of country column in the source data frame df. The second column should be cities that should store a vector of values in the city column from df that correspond to a given country.

Now let me show three ways how you can do it using the combine function. The key to the solution is the following rule of how combine works (taken from the documentation):

In all of these cases, function can return either a single row or multiple rows. As a particular rule, values wrapped in a Ref or a 0-dimensional AbstractArray are unwrapped and then treated as a single row.

This means that in order to make a vector to be treated as a single row we have three options:

• wrap a vector in another vector as its single element (so we have a multi-row object but with a single row);
• wrap a vector in Ref;
• wrap a vector in a 0-dimensional AbstractArray, which can be done using the fill function.

So the three solutions to our problem are:

julia> combine(groupby(df, :country, sort=true), :city => (x -> [x]) => :cities)
2×2 DataFrame
Row │ country  cities
│ String   SubArray…
─────┼─────────────────────────────────────
2 │ Poland   ["Olecko", "Ełk"]

julia> combine(groupby(df, :country, sort=true), :city => Ref => :cities)
2×2 DataFrame
Row │ country  cities
│ String   SubArray…
─────┼─────────────────────────────────────
2 │ Poland   ["Olecko", "Ełk"]

julia> combine(groupby(df, :country, sort=true), :city => fill => :cities)
2×2 DataFrame
Row │ country  cities
│ String   SubArray…
─────┼─────────────────────────────────────