# Getting full factorial design in DataFrames.jl

# Introduction

Often when working with data we need to get all possible combinations of some input factors in a data frame. In the field of design of experiments this is called full factorial design. In this post I will discuss two functions that DataFrames.jl provides that can help you to generate such designs if you needed them.

The post was written under Julia 1.10.1 and DataFrames.jl 1.6.1.

# What is a full factorial design and how to create it?

Assume we are a cardboard box producer have three factors describing a box: its width, height, and depth. Each of them has a finite set of possible values (due to production process limitations). Let us create some sample data of this kind:

```
julia> height = [10, 12]
2-element Vector{Int64}:
10
12
julia> width = [8, 10, 15]
3-element Vector{Int64}:
8
10
15
julia> depth = [5, 6]
2-element Vector{Int64}:
5
6
```

Our task is to compute the volume of all possible boxes that can be created by our factory.
The list of all possible cardboard box configurations is a full factorial design.
You can get an iterator of these values by using the `Iterators.product`

function:

```
julia> Iterators.product(height, width, depth)
Base.Iterators.ProductIterator{Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}(([10, 12], [8, 10, 15], [5, 6]))
```

This function is lazy, to see the result we need to materialize its return value using e.g. `collect`

:

```
julia> collect(Iterators.product(height, width, depth))
2×3×2 Array{Tuple{Int64, Int64, Int64}, 3}:
[:, :, 1] =
(10, 8, 5) (10, 10, 5) (10, 15, 5)
(12, 8, 5) (12, 10, 5) (12, 15, 5)
[:, :, 2] =
(10, 8, 6) (10, 10, 6) (10, 15, 6)
(12, 8, 6) (12, 10, 6) (12, 15, 6)
```

We can see that we get an array of tuples of all possible combinations of dimensions. Let us now compute the volumes:

```
julia> prod.(collect(Iterators.product(height, width, depth)))
2×3×2 Array{Int64, 3}:
[:, :, 1] =
400 500 750
480 600 900
[:, :, 2] =
480 600 900
576 720 1080
```

The results are nice and efficient. However, sometimes it is more convenient to have this data in a data frame.

# Full factorial design in DataFrames.jl

Let us repeat the exercise using DataFrames.jl:

```
julia> using DataFrames
julia> df = allcombinations(DataFrame; height, width, depth)
12×3 DataFrame
Row │ height width depth
│ Int64 Int64 Int64
─────┼──────────────────────
1 │ 10 8 5
2 │ 12 8 5
3 │ 10 10 5
4 │ 12 10 5
5 │ 10 15 5
6 │ 12 15 5
7 │ 10 8 6
8 │ 12 8 6
9 │ 10 10 6
10 │ 12 10 6
11 │ 10 15 6
12 │ 12 15 6
```

Note that we passed `height`

, `width`

, `depth`

as keyword arguments to `allcombinations`

taking advantage of a nice functionality of Julia that in this case we can avoid writing `height=height`

as just writing `height`

gives us the same result.

Now we can add a volume column:

```
julia> transform!(df, All() => ByRow(*) => "volume")
12×4 DataFrame
Row │ height width depth volume
│ Int64 Int64 Int64 Int64
─────┼──────────────────────────────
1 │ 10 8 5 400
2 │ 12 8 5 480
3 │ 10 10 5 500
4 │ 12 10 5 600
5 │ 10 15 5 750
6 │ 12 15 5 900
7 │ 10 8 6 480
8 │ 12 8 6 576
9 │ 10 10 6 600
10 │ 12 10 6 720
11 │ 10 15 6 900
12 │ 12 15 6 1080
```

We have added the `"volume"`

column in place to `df`

. Note that we used `*`

as it can take any number of positional arguments and returns their product. The `ByRow`

wrapper signals that we want to perform this operation row-wise.

In comparison to the solution shown before many users find presentation of a full factorial design easier to work with.

# What if we have a fractional factorial design?

Sometimes your data is incomplete, and some level combinations are missing. Let us start by creating such a data frame:

```
julia> df2 = df[Not(2, 5, 9), :]
9×4 DataFrame
Row │ height width depth volume
│ Int64 Int64 Int64 Int64
─────┼──────────────────────────────
1 │ 10 8 5 400
2 │ 10 10 5 500
3 │ 12 10 5 600
4 │ 12 15 5 900
5 │ 10 8 6 480
6 │ 12 8 6 576
7 │ 12 10 6 720
8 │ 10 15 6 900
9 │ 12 15 6 1080
```

Now one might ask to complete this design and re-fill the design to be complete. This can be done by the `fillcombinations`

function. Let us see it at work:

```
julia> fillcombinations(df2, Not("volume"))
12×4 DataFrame
Row │ height width depth volume
│ Int64 Int64 Int64 Int64?
─────┼───────────────────────────────
1 │ 10 8 5 400
2 │ 12 8 5 missing
3 │ 10 10 5 500
4 │ 12 10 5 600
5 │ 10 15 5 missing
6 │ 12 15 5 900
7 │ 10 8 6 480
8 │ 12 8 6 576
9 │ 10 10 6 missing
10 │ 12 10 6 720
11 │ 10 15 6 900
12 │ 12 15 6 1080
```

Observe that after calling this function we have created a new data frame with the missing rows added. The `"volume"`

column is filled by default with `missing`

for rows that were added. The `Not("volume")`

argument meant that we want to get all combinations of values in all columns except `"volume"`

.

# Conclusions

Today we worked with two functions: `allcombinations`

and `fillcombinations`

. You will find them useful if in your work you will ever need to create all combinations of levels of some factors. This functionality seems niche, but it is needed in practice surprisingly often.