# Introduction

Next week I organize WAW2024 conference. The event covers various aspects of theoretical and applied modeling of networks.

As an introduction I want to run a simulation of an example problem. Consider a random graph with a probability of an edge between two nodes equal to p. Next, assume that we pick an edge uniformly at random from this graph and then remove two nodes forming this edge from the graph as matched. The question is what is the expected fraction of nodes that are going to be matched by this process.

Today, I will investigate this problem using simulation.

The post was written under Julia 1.10.1, Graphs.jl 1.11.0, and DataFrames.jl 1.6.1.

# The simulation

Here is a simulator of our greedy matching process. In the simulation we traverse all edges of the graph in a random order. In the matched vector we keep track of which nodes have been already matched.

using Graphs
using Random
using Statistics

function run_sim(n::Integer, p::Real)
g = erdos_renyi(n, p)
matched = fill(false, n)
for e in shuffle!(collect(edges(g)))
n1, n2 = e.src, e.dst
if !(matched[n1] || matched[n2])
matched[n1] = true
matched[n2] = true
end
end
return mean(matched)
end


# The experiment

Let us now test our simulator for a graph on 10000 nodes and p varying from 0.00001 to 0.1 (on logarithmic scale).

julia> using DataFrames

julia> df = DataFrame(p=Float64[], rep=Int[], res=Float64[])
0×3 DataFrame
Row │ p        rep    res
│ Float64  Int64  Float64
─────┴─────────────────────────

julia> ps = [10.0^i for i in -5:-1]
5-element Vector{Float64}:
1.0e-5
0.0001
0.001
0.010000000000000002
0.1

julia> Random.seed!(1234);

julia> @time for p in ps, rep in 1:16
push!(df, (p, rep, run_sim(10_000, p)))
end
79.190585 seconds (438.02 M allocations: 14.196 GiB, 7.13% gc time, 0.36% compilation time)

julia> df
80×3 DataFrame
Row │ p        rep    res
│ Float64  Int64  Float64
─────┼─────────────────────────
1 │  1.0e-5      1   0.0948
2 │  1.0e-5      2   0.094
3 │  1.0e-5      3   0.097
4 │  1.0e-5      4   0.0892
5 │  1.0e-5      5   0.0848
6 │  1.0e-5      6   0.093
⋮  │    ⋮       ⋮       ⋮
76 │  0.1        12   0.9992
77 │  0.1        13   0.999
78 │  0.1        14   0.999
79 │  0.1        15   0.999
80 │  0.1        16   0.9988
69 rows omitted


The simulation took a bit over 1 minute, mainly due to the p=0.1 case which generates a lot of edges in the graph. Let us aggregate the obtained data to get the mean and standard error, and range of the results over all values of p:

julia> combine(groupby(df, "p"),
"p" => (x -> 10_000 * first(x)) => "mean_degree",
"res" => mean,
"res" => (x -> std(x) / sqrt(length(x))) => "res_se",
"res" => extrema)
5×5 DataFrame
Row │ p        mean_degree  res_mean  res_se       res_extrema
│ Float64  Float64      Float64   Float64      Tuple…
─────┼───────────────────────────────────────────────────────────────
1 │  1.0e-5          0.1  0.090975  0.000842986  (0.0848, 0.097)
2 │  0.0001          1.0  0.499888  0.00190971   (0.4848, 0.5134)
3 │  0.001          10.0  0.909425  0.000523729  (0.9062, 0.9126)
4 │  0.01          100.0  0.990162  0.000257694  (0.9888, 0.992)
5 │  0.1          1000.0  0.999     5.47723e-5   (0.9986, 0.9994)


We can see that the sharp increase of fraction of matched nodes happens around mean degree of 1 in the graph. Additionally we see that even for high p we do not match every node in the greedy matching process. Finally the obtained results are relatively well concentrated around the mean.

# Conclusions

If you want to see how this problem can be solved analytically I recommend you to read this paper. Using the formulas derived there we can compare our simulation results with the asymptotic theory:

julia> (10_000 .* ps) ./ (10_000 .* ps .+ 1)
5-element Vector{Float64}:
0.09090909090909091
0.5
0.9090909090909091
0.9900990099009901
0.999000999000999


Indeed we see that the match is quite good.

If such problems are interesting for you I invite you to join us during WAW2024 conference.