# Introduction

A classical problem in computing is finding what is the number comparisons that a comparison sort algorithm minimally requires in worst-case given an $$n$$-element list. Let us try to tackle this problem, as usual using DataFrames.jl to support the presentation of the results.

In what follows I use Julia 1.5.3, DataFrames.jl 0.21.8 and Pipe.jl 1.3.0.

# Theoretical lower bound

As there are $$n!$$ permutations of an $$n$$-element set and doing $$k$$ comparisons allow us to distinguish at most $$2^k$$ states so we get that $$k\geq\lceil\log_2(n!)\rceil$$.

Let us use DataFrames.jl and to quickly calculate this bound on the number of queries for $$n\in$$:

julia> using DataFrames

julia> using Pipe

julia> @pipe DataFrame(n=1:7) |>
transform(_, :n => ByRow(factorial) => :permutations) |>
transform(_, :permutations =>
ByRow(x -> ceil(Int, log2(x))) =>
:queries)
7×3 DataFrame
│ Row │ n     │ permutations │ queries │
│     │ Int64 │ Int64        │ Int64   │
├─────┼───────┼──────────────┼─────────┤
│ 1   │ 1     │ 1            │ 0       │
│ 2   │ 2     │ 2            │ 1       │
│ 3   │ 3     │ 6            │ 3       │
│ 4   │ 4     │ 24           │ 5       │
│ 5   │ 5     │ 120          │ 7       │
│ 6   │ 6     │ 720          │ 10      │
│ 7   │ 7     │ 5040         │ 13      │



Now, we switch to a core of the exercise — let us write a program that sorts up to seven elements in no more than the above calculated number of queries.

# The judge and the player

Assume we want to have a Judge that knows the order of objects and can be queried to compare the position in the order of the queried objects:

julia> mutable struct Judge
state::Vector{Int}
queries::Int

function Judge(state)
@assert isperm(state)
new(state, 0)
end
end

julia> function compare(judge::Judge, i::Int, j::Int)
judge.queries += 1
judge.state[i] < judge.state[j]
end
compare (generic function with 1 method)


As you can see the Judge will count the number of queries that it received. Also we restrict the Judge to accept only permutations for simplicity (in general it could accept any comparable objects though).

Now we create a player function that will be allowed to query the Judge and it is expected to return the ordering of objects that the Judge has.

julia> using Combinatorics

julia> function player(judge::Judge, options::Vector{Vector{Int}})
@assert !isempty(options)
n = length(options)
length(options) == 1 && return options
besti, bestj, best_split = 0, 0, typemax(Int)
starti = length(options) == factorial(n) ? n - 1 : 1
for i in starti:n, j in i+1:n
yes_count = count(opt -> opt[i] < opt[j], options)
current_split = max(yes_count, length(options) - yes_count)
if current_split < best_split
best_split = current_split
besti, bestj = i, j
end
end
query = compare(judge, besti, bestj)
filter!(opt -> (opt[besti] < opt[bestj]) == query, options)
return player(judge, options)
end
player (generic function with 2 methods)

julia> player(judge::Judge) =
player(judge, collect(permutations(1:length(judge.state))))
player (generic function with 2 methods)


Initially player allows all permutations. With each question only the permutations that are consistent with the received answers are retained. Note that the player function uses a very simple rule: it asks the question after which the number of remaining options in the worst case is minimized. As a special case the starti = length(options) == factorial(n) ? n - 1 : 1 line uses the fact that if we ask the first question it does not really matter which pair of elements we compare (but starting from the second question we consider all possible comparisons).

Let us test our player function on a few examples:

julia> j1 = Judge([1, 3, 5, 6, 4, 2])
Judge([1, 3, 5, 6, 4, 2], 0)

julia> player(j1)
6-element Array{Int64,1}:
1
3
5
6
4
2

julia> j1.queries
10

julia> j2 = Judge([7, 6, 5, 4, 3, 1, 2])
Judge([7, 6, 5, 4, 3, 1, 2], 0)

julia> player(j2)
7-element Array{Int64,1}:
7
6
5
4
3
1
2

julia> j2.queries
12


The function seems to be working correctly. Now let us test it systematically for $$n\in$$:

julia> df = DataFrame()
0×0 DataFrame

julia> for n in 1:7
queries = map(permutations(1:n)) do perm
j = Judge(perm)
if player(j) != perm
error("player gave wrong result for $perm input") end return j.queries end append!(df, DataFrame(n=n, queries=queries)) end julia> @pipe groupby(df, [:n, :queries], sort=true) |> combine(_, nrow) |> unstack(_, :queries, :n, :nrow) |> coalesce.(_, "") |> show(_, eltypes=false, summary=false) │ Row │ queries │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ ├─────┼─────────┼───┼───┼───┼────┼─────┼─────┼──────┤ │ 1 │ 0 │ 1 │ │ │ │ │ │ │ │ 2 │ 1 │ │ 2 │ │ │ │ │ │ │ 3 │ 2 │ │ │ 2 │ │ │ │ │ │ 4 │ 3 │ │ │ 4 │ │ │ │ │ │ 5 │ 4 │ │ │ │ 8 │ │ │ │ │ 6 │ 5 │ │ │ │ 16 │ │ │ │ │ 7 │ 6 │ │ │ │ │ 8 │ │ │ │ 8 │ 7 │ │ │ │ │ 112 │ │ │ │ 9 │ 9 │ │ │ │ │ │ 304 │ │ │ 10 │ 10 │ │ │ │ │ │ 416 │ │ │ 11 │ 11 │ │ │ │ │ │ │ 80 │ │ 12 │ 12 │ │ │ │ │ │ │ 2912 │ │ 13 │ 13 │ │ │ │ │ │ │ 2048 │  As you can see we never got an error, so our player seems to work correctly. Also it works optimally in worst-case. The bound we have calculated above was never exceeded. As an additional exercise I have used several common functions from DataFrames.jl to (hopefully) nicely format the resulting table (in DataFrames.jl 0.22 release it will be yet nicer as we are switching to PrettyTables.jl as a back-end for text/plain printing). # Take away notes Unfortunately for $$n=8$$ the simple algorithm we used does not produce the desired results any more (warning — this calculation is a bit more lengthly): julia> queries = map(permutations(1:8)) do perm j = Judge(perm) if player(j) != perm error("player gave wrong result for$perm input")
end
return j.queries
end;

julia> @pipe DataFrame(queries=queries) |>
groupby(_, :queries, sort=true) |>
combine(_, nrow)
4×2 DataFrame
│ Row │ queries │ nrow  │
│     │ Int64   │ Int64 │
├─────┼─────────┼───────┤
│ 1   │ 14      │ 1072  │
│ 2   │ 15      │ 22104 │
│ 3   │ 16      │ 16936 │
│ 4   │ 17      │ 208   │


and ceil(Int, log2(factorial(8))) is equal to 16. It is known that this bound is achievable, so we fail in 208 cases. Therefore, as a challenge, I leave you to think of an algorithm that is better. A natural approach is to find optimal splits using backtracking while aggressively pruning branches that cannot lead to an optimal solution to reduce the complexity, but maybe some simple heuristic on top of the one I considered will be enough?