Introduction

Warning! The post includes (self)promotion.

Recently with Łukasz Kraiński we have published with Manning the Hands-on Data Science with Julia liveProject.

In this post I want to discuss the idea behind this format and what you can expect inside.

Why liveProject format?

If you are reading this post most likely you know that I write a lot on Julia Slack, Julia Discourse, StackOverflow [julia] tag, give various tutorials, write package documentation, and finally I do weekly updates to this blog.

So how is liveProject different format so that I have decided to give it a try instead of doing e.g. a new JuliaAcademy course?

First, the content is project oriented. This means that you get a business description of the challenge and should try to finish the task yourself. If something is challenging to finish then you have three levels of support: hints, partial solution, and finally full solution for every task. It might seem as not very significant, but I believe that trying to do something on ones own (and getting only as much help as really required) is superior to just reading through some example codes.

The second valuable option is that on the platform you can compare your implementation with other solutions to the same problem and also discuss it with other coders or mentors.

All this means that although the projects are marked as intermediate level content (since such experience is required to do the tasks on your own) even if one is a beginner the material is still useful, with a twist that in this case you probably will have to rely more on the provided help to be able to finish the tasks.

I will see how it works out and maybe write a post with conclusions after some time of giving the liveProject format a try.

What is inside?

The Hands-on Data Science with Julia liveProject is divided into five parts. Here I will focus on commenting Julia packages are featured in them (all packages are using their latest versions as for the time of writing this post; I list them incrementally as they are introduced in consecutive projects):

  1. data prepossessing: Arrow.jl, Chain.jl, CSV.jl, DataFrames.jl, FreqTables.jl, Plots.jl, StatsBase.jl;
  2. clustering (k-means, DBSCAN): Clustering.jl, Distances.jl;
  3. dimensionality reduction (PCA, t-SNE, UMAP): Conda.jl, MultivariateStats.jl, PyCall.jl;
  4. predictive modelling - regression problems (random forest, GLM): DecisionTree.jl, GLM.jl, HypothesisTests.jl;
  5. predictive modelling - classification problems (XGBoost): ROCAnalysis.jl, XGBoost.jl.

Conclusions

As opposed to standard teaching materials liveProjects are a paid content and that is why I have issued a (self)promotion warning at the beginning of this post. However, a good thing is that the first project on data preprocessing is available for free so it is easy to check if you like the content that we have prepared.