# Introduction

Often when working with data we need to get all possible combinations of some input factors in a data frame. In the field of design of experiments this is called full factorial design. In this post I will discuss two functions that DataFrames.jl provides that can help you to generate such designs if you needed them.

The post was written under Julia 1.10.1 and DataFrames.jl 1.6.1.

# What is a full factorial design and how to create it?

Assume we are a cardboard box producer have three factors describing a box: its width, height, and depth. Each of them has a finite set of possible values (due to production process limitations). Let us create some sample data of this kind:

julia> height = [10, 12]
2-element Vector{Int64}:
10
12

julia> width = [8, 10, 15]
3-element Vector{Int64}:
8
10
15

julia> depth = [5, 6]
2-element Vector{Int64}:
5
6


Our task is to compute the volume of all possible boxes that can be created by our factory. The list of all possible cardboard box configurations is a full factorial design. You can get an iterator of these values by using the Iterators.product function:

julia> Iterators.product(height, width, depth)
Base.Iterators.ProductIterator{Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}(([10, 12], [8, 10, 15], [5, 6]))


This function is lazy, to see the result we need to materialize its return value using e.g. collect:

julia> collect(Iterators.product(height, width, depth))
2×3×2 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
(10, 8, 5)  (10, 10, 5)  (10, 15, 5)
(12, 8, 5)  (12, 10, 5)  (12, 15, 5)

[:, :, 2] =
(10, 8, 6)  (10, 10, 6)  (10, 15, 6)
(12, 8, 6)  (12, 10, 6)  (12, 15, 6)


We can see that we get an array of tuples of all possible combinations of dimensions. Let us now compute the volumes:

julia> prod.(collect(Iterators.product(height, width, depth)))
2×3×2 Array{Int64, 3}:
[:, :, 1] =
400  500  750
480  600  900

[:, :, 2] =
480  600   900
576  720  1080


The results are nice and efficient. However, sometimes it is more convenient to have this data in a data frame.

Let us repeat the exercise using DataFrames.jl:

julia> using DataFrames

julia> df = allcombinations(DataFrame; height, width, depth)
12×3 DataFrame
Row │ height  width  depth
│ Int64   Int64  Int64
─────┼──────────────────────
1 │     10      8      5
2 │     12      8      5
3 │     10     10      5
4 │     12     10      5
5 │     10     15      5
6 │     12     15      5
7 │     10      8      6
8 │     12      8      6
9 │     10     10      6
10 │     12     10      6
11 │     10     15      6
12 │     12     15      6


Note that we passed height, width, depth as keyword arguments to allcombinations taking advantage of a nice functionality of Julia that in this case we can avoid writing height=height as just writing height gives us the same result.

Now we can add a volume column:

julia> transform!(df, All() => ByRow(*) => "volume")
12×4 DataFrame
Row │ height  width  depth  volume
│ Int64   Int64  Int64  Int64
─────┼──────────────────────────────
1 │     10      8      5     400
2 │     12      8      5     480
3 │     10     10      5     500
4 │     12     10      5     600
5 │     10     15      5     750
6 │     12     15      5     900
7 │     10      8      6     480
8 │     12      8      6     576
9 │     10     10      6     600
10 │     12     10      6     720
11 │     10     15      6     900
12 │     12     15      6    1080


We have added the "volume" column in place to df. Note that we used * as it can take any number of positional arguments and returns their product. The ByRow wrapper signals that we want to perform this operation row-wise.

In comparison to the solution shown before many users find presentation of a full factorial design easier to work with.

# What if we have a fractional factorial design?

Sometimes your data is incomplete, and some level combinations are missing. Let us start by creating such a data frame:

julia> df2 = df[Not(2, 5, 9), :]
9×4 DataFrame
Row │ height  width  depth  volume
│ Int64   Int64  Int64  Int64
─────┼──────────────────────────────
1 │     10      8      5     400
2 │     10     10      5     500
3 │     12     10      5     600
4 │     12     15      5     900
5 │     10      8      6     480
6 │     12      8      6     576
7 │     12     10      6     720
8 │     10     15      6     900
9 │     12     15      6    1080


Now one might ask to complete this design and re-fill the design to be complete. This can be done by the fillcombinations function. Let us see it at work:

julia> fillcombinations(df2, Not("volume"))
12×4 DataFrame
Row │ height  width  depth  volume
│ Int64   Int64  Int64  Int64?
─────┼───────────────────────────────
1 │     10      8      5      400
2 │     12      8      5  missing
3 │     10     10      5      500
4 │     12     10      5      600
5 │     10     15      5  missing
6 │     12     15      5      900
7 │     10      8      6      480
8 │     12      8      6      576
9 │     10     10      6  missing
10 │     12     10      6      720
11 │     10     15      6      900
12 │     12     15      6     1080


Observe that after calling this function we have created a new data frame with the missing rows added. The "volume" column is filled by default with missing for rows that were added. The Not("volume") argument meant that we want to get all combinations of values in all columns except "volume".

# Conclusions

Today we worked with two functions: allcombinations and fillcombinations. You will find them useful if in your work you will ever need to create all combinations of levels of some factors. This functionality seems niche, but it is needed in practice surprisingly often.