# Development of the Julia language

During 20 years of my work as a researcher I have used numerous programming languages to do scientific computing, chiefly R, Python, and Java. However, when I learned Julia I immediately felt this is a to-go solution, although I started using it when version 0.3 was released and the language and its ecosystem was still immature.

Currently Julia has reached version 1.4.2 and in many fields its package ecosystem provides best-in-class functionality.

A natural question to as is who has made this happen. It is easy enough to find out on the GitHub page of the Julia project here. However, the default GitHub interface allows you to only see contributions by number of commits, additions or deletions. We can learn from this that Jeff Bezanson is a leader by far in all these categories.

However, the statistics show you the whole history of the git repository. I was always curious who is the author of the current state of the code. Essentially, what I wanted to do is blame the whole repository and count the distribution of the number of lines committed by the authors.

The problem is that by default git does not give you such an option. There are ways to achieve this, which I discuss below. The project was interesting for me, because I think it nicely shows what Julia offers you when you have a scripting task at hand.

# Before we start

In order to follow the examples below you need to have git installed. Also you should have git-extras installed. If you are on Ubuntu just write sudo apt install git-extras and it should be added.

In order to analyze the repository we need to download it to our local machine e.g. to julia_src folder. This can be done using the following command (warning! it takes some time):

~$git clone https://github.com/JuliaLang/julia.git julia_src Cloning into 'julia_src'... remote: Enumerating objects: 83, done. remote: Counting objects: 100% (83/83), done. remote: Compressing objects: 100% (78/78), done. remote: Total 325678 (delta 31), reused 16 (delta 5), pack-reused 325595 Receiving objects: 100% (325678/325678), 181.28 MiB | 1.20 MiB/s, done. Resolving deltas: 100% (244259/244259), done.  Now switch our working directory to the newly downloaded repository: ~$ cd julia_src
~/julia_src (master)$ # Using git You can get the information we want using the summary command provided by git-extras. Here is how you can do it: The whole list is quite long so I have cut it down to show only people with at least 1.0% contribution. As you can see from the distribution Jameson Nash is really close to Jeff Bezanson in the ranking. As you can see I have additionally added time in front of the command to see how long the operation took. For a such large repository as this one (note that it has almost 500,000 lines of code) it is quite time consuming. The first thing I did was search over the Internet and I have found the following proposal here: The solution finished in 4 minutes and 14 seconds, so it was two times faster (the downside is that it does not produce a nice percentage information). In general it lead me to thinking about writing a Julia script that would do the job and check its speed. In the next section you can find my take on it. # Using Julia In the solution I use FreqTables.jl, ProgressMeter.jl, and Pipe.jl in the following versions: (@v1.4) pkg> status FreqTables ProgressMeter Pipe Status ~/.julia/environments/v1.4/Project.toml [da1fdf0e] FreqTables v0.4.0 [b98c9c47] Pipe v1.2.0 [92933f4c] ProgressMeter v1.3.0  Here is the code that does the job of listing authors of all lines in the git repository: As you can see I am using Threads.@threads to use multiple threads for my computations. In variable p I keep a progress meter that helps me to visually track how the computations go. In the code a line that looks innocent but is actually quite relevant is shuffle!(files). You might wonder why do I randomly reorder files for processing. The reason is that the files most probably (and in fact also actually) do not have the same cost of processing using git blame. Therefore I do not want to have expensive files clumped together. This has two benefits: • ProgressMeter.jl is able to quickly give me a good estimate of ETA (e.g. if cheap files were clumped together at the beginning of processing the estimate would be overly optimistic); • Threads.@threads does static allocation of jobs to threads; this against means that we do prefer to shuffle jobs in order to reduce the risk that all expensive jobs go to a single thread, which would negatively affect the overall processing time. Finally note that I wrap append! to auths vector in a lock to avoid race condition (different threads potentially might try to update auths at the same time). This is not needed for next!(p) operation as ProgressMeter.jl is thread-safe. Now let us test the above code. First start Julia using four threads (you can change it of course to other number of threads) using the command: ~/julia_src (master)$ JULIA_NUM_THREADS=4 julia


(on Windows do set JULIA_NUM_THREADS=4 before running Julia)

Next load the script I have given above. You are now ready for the test. Here is the code I have run on my machine:

As you can see I am well under 2 minutes now.

In the last part of code I have used Pipe.jl which greatly facilitates using pipes in Julia (there is also a very nice package Underscores.jl which I recommend you to investigate; it has more functionality but this comes at the cost of being a bit more complex to master).

What Pipe.jl does is best described by a section of its manual, so I just reuse it here:

if after @pipe you place a underscore in the right hand of |>, it will be replaced with the left hand side. So:

@pipe a |> b(x, _) # == b(x, a)


I hope you enjoyed this little exercise (and now we know exactly whose code we run when using Julia).

# P.S. Setting up your environment

As you probably know I am obsessed with proper environment setup. In an earlier post I discussed that you should always make sure you run proper versions of the packages. What is a quick way to set up the environment for the project described in this post?

When you are in Julia REPL (e.g. started as instructed above in the julia_src directory) switch to the package manager mode by pressing ] and execute the following commands (I am showing the whole output which is a bit long but allows you to check which packages got recursively added to Manifest.toml):

(@v1.4) pkg> activate .
Activating new environment at ~/julia_src/Project.toml

(julia_src) pkg> add FreqTables@0.4.0 Pipe@1.2.0 ProgressMeter@1.3.0
Updating registry at ~/.julia/registries/General
Updating git-repo https://github.com/JuliaRegistries/General.git
Resolving package versions...
Installed Parsers ─ v1.0.5
Updating ~/julia_src/Project.toml
[da1fdf0e] + FreqTables v0.4.0
[b98c9c47] + Pipe v1.2.0
[92933f4c] + ProgressMeter v1.3.0
Updating ~/julia_src/Manifest.toml
[324d7699] + CategoricalArrays v0.8.1
[861a8166] + Combinatorics v1.0.2
[9a962f9c] + DataAPI v1.3.0
[864edb3b] + DataStructures v0.17.17
[e2d170a0] + DataValueInterfaces v1.0.0
[da1fdf0e] + FreqTables v0.4.0
[41ab1584] + InvertedIndices v1.0.0
[82899510] + IteratorInterfaceExtensions v1.0.0
[682c06a0] + JSON v0.21.0
[e1d29d7a] + Missings v0.4.3
[86f7a689] + NamedArrays v0.9.4
[bac558e1] + OrderedCollections v1.2.0
[69de0a69] + Parsers v1.0.5
[b98c9c47] + Pipe v1.2.0
[92933f4c] + ProgressMeter v1.3.0
[ae029012] + Requires v1.0.1
[3783bdb8] + TableTraits v1.0.0
[bd369af6] + Tables v1.0.4
[2a0f44e3] + Base64
[8bb1440f] + DelimitedFiles
[8ba89e20] + Distributed
[9fa8497b] + Future
[b77e0a4c] + InteractiveUtils
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[de0858da] + Printf
[9a3f8284] + Random
[ea8e919c] + SHA
[9e88b42a] + Serialization
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode

(julia_src) pkg> status
Status ~/julia_src/Project.toml
[da1fdf0e] FreqTables v0.4.0
[b98c9c47] Pipe v1.2.0
[92933f4c] ProgressMeter v1.3.0


Now you are sure all will work as expected. Just press backspace to leave the package manager mode and you are ready to run the examples.