The goal of this page is to give an introduction on matching theory and short proofs of the main results. The approach is based on a lecture by András Sebő. András's inspiration came in part from Lovász' exercises on Edmonds algorithm, but mostly from his own works on matchings and $T$-joins and other related topics. I wrote the proofs from my memory, which means that they may differ from the original, probably in the wrong direction. These notes can also be taken as a list of consecutive exercises (and that was András's original intention); the proof of each claim, lemma or theorem should not be hard to solve using the previous ones.
Given a connected graph $G=(V,E)$, a matching is a subset $N \subseteq E$ of vertex-disjoint edges. We denote $\nu(G)$ the maximum cardinality of a matching of $G$. A matching with maximum cardinality is a maximum matching. Notice that a matching is by definition a subgraph with maximal degree $1$. A simple combinatorial dual object to a matching is a vertex cover: a set of vertices that intersects every edge of $G$. The size of a matching cannot exceed the size of vertex cover, as each edge of the matching must use a distinct vertex in the vertex cover. Using the notation $\tau(G)$ for the cardinality of a minimum vertex cover, this translates into:
Claim 1: $$\nu(G) \leq \tau(G)$$
We say that a vertex $v$ is covered by a matching $M$ when $M$ contains an edge incident to $v$. We define the two following disjoint sets:
Notice that $C \cup D = V$.
We say that a path $P$ is $(E_1,E_2)$-alternating if $E(P) \subseteq E_1 \cup E_2$ but no two consecutive edges of $P$ are both in $E_1$ or both in $E_2$. For a particular matching $M$, an $M$-alternating path is an $(M,E-M)$-alternating path. Alternating paths play a great role in matching theory. Notice that a matching is equivalently a subgraph of maximum degree $1$. Hence, the symmetric difference (that we denote by $\triangle$) of two matchings $M$ and $M'$ is a disjoint union of $(M,M')$-alternating paths (some of them can be closed, that is, cycles), because the degree of any vertex is at most $2$. Moreover, if $P$ is a path of $M \triangle M'$, then $M \triangle P$ is also a matching, as can be checked at each vertex locally. From these observations follows the classical result:
Claim 2: A matching $M$ is maximal iff there is no $(M,E-M)$-alternating path whose endpoints are not covered by $M$.
Unfortunately, it does not give an efficient algorithm in the general case (but it does in the bipartie case). The reason is that finding an alternating path as stated is not easy. It is a major contribution from Jack Edmonds to have demonstrated a way to do so, and we will cover his algorithm later. But for now we take a different direction to gain some insights on the structure of the problem.
Let's consider an edge $uv$ with $u$ and $v$ in $D$.
Claim 3: If $uv \in E \cap D^2$, then for any maximal matchings $M_u$ and $M_v$ of $G - u$ and $G-v$ respectively, there is an even $(M_u,M_v)$-alternating $(u,v)$-path.
Proof: Consider the path $P_u$ with endpoint $u$, and let $w$ be the opposite endpoint.
An immediate consequence is for bipartite graphs: for any edge $uv$ of a bipartite graph, at least one of $u$, $v$ is in $C$. From this we prove:
Theorem 4: (Kőnig) For any bipartite graph $G$, $\nu(G) = \tau(G)$
Proof: We only need to prove $\nu(G) \geq \tau(G)$. The proof is by induction on $V(G)$. Let $M$ be a maximal matching, and $e = uv \in M$. We may assume that $u \in C$. Consider $G-u$, this is a bipartite graph on which we can apply induction. Let $T$ a minimum vertex cover of $G-u$, of cardinality $\tau(G-u) = \nu(G-u) = \nu(G) - 1$. Then, $T+u$ is a vertex cover of $G$. Indeed, every edge incident to $u$ is covered by $u$, and every edge not incident to $u$ is in $G-u$, and thus covered by $T$. Hence $\tau(G) \leq \tau(G-u) + 1 = \nu(G)$.
Note how this theorem fails on non-bipartite graph: an odd cycle of length $2n+1$ has $\nu(C_{2n+1}) = n$ but $\tau(C_{2n+1}) = n+1$. The point is that it fails precisely because there are edges whose both endpoints are in $D$. To get a result for general graphs, we need to treat this case in a specific way. For that, and this is the great idea of Edmonds, we introduce the contraction of a graph $G$ by an edge set $X$, denoted $G / X$, as the graph obtained by identifying the endpoints of edges of $X$, and removing loops and multiple edges.
Claim 5: Let $P$ be the alternating path of Claim 3 , and $2k$ be its length. Then, $\nu(G / P) = \nu(G) - k$ (notice that we are thus contracting $2k+1$ vertices).
Proof:
Theorem 6: (Berge , Tutte) For a graph $G$ and a vertex set $X \subset V(G)$, denote $q(X)$ (or $q_G(X)$ when we need the precision) the number of odd components of $G - X$. Then:
$$ |V(G)| = 2\nu(G) + \max_{X}~q(X) - |X| $$
That is, the minimum number of vertices not covered by a matching is equal to the maximum of $q(X) - |X|$ over all vertex sets $X$.
Proof: One direction is obvious: given a set $X$ with $q(X) \geq |X|$ and a matching $M$, each odd component of $G - X$ has either a vertex not covered by $M$, or a vertex matched by $M$ to a vertex of $X$. But at most $|X|$ odd components may be in the latter case. Hence $|V(G)| - 2\nu(G) \geq q(X) - |X|$ for every possible $X$.
We prove the equality by induction on $|V|$. Consider a matching $M$ of $G$. Let $uv \in M$.
A set $X$ reaching the maximum in Theorem 6 is called a Tutte set. Tutte sets are natural certificates of optimality for the matching problem. As with any min-max theorem, we have some complementary slackness criterion for the optimality of a matching $M$ and a Tutte set $X$ (that we actually used in the induction):
Note also that each odd component of $G - X$ either has a vertex not covered by $M$, or (exclusively) has a vertex matched with a vertex of $X$ in $M$.
We will refine this criterion and show that there is actually a canonical Tutte set latter, but first we use the contraction idea to get a polynomial-time algorithm for the maximum cardinality matching.
The goal of this section is to exhibit and prove an algorithm for finding a maximum perfect matching. We do not want to tackle the details of implementation and how to get the fastest possible algorithm, instead we just want to show that is is possible to compute efficiently a maximum matching, and learn more about the structure of matchings. Hence, our algorithm will actually do only the following:
Claim 2 ensures that exactly one of the two possible outputs is always possible. By iterating the algorithm, starting from $M = \emptyset$, until the algorithm outputs a Tutte set, we get a maximal matching from any graph $G$. Edmonds' complete algorithm would actually use ad-hoc data structures to encode $M$, that would be updated at each step, to get a better asymptotic complexity, but our approach only keeps the heart of the procedure and is best suited for our purpose.
Given a graph $G$ and a matching $M$, we try to find an augmenting path, that is, an $M$-alternating $(u,v)$-path where $u$ and $v$ are not covered by $M$. Let's try a naive idea: start from an uncovered vertex $u$, and look at its neighbours. If one of them is also uncovered, say $v$, we are done, $uv$ is a path as desired. Else, for any neighour $x$ of $u$, $x$ is matched to some vertex $y$ by $M$. Then if $y$ has an uncovered neighbour $v$, $u,x,y,v$ is an $M$-alternating path as desired, and otherwise, we check the vertices matched to neighbours of $v$, and so on. If we are (very) lucky, we will find an augmenting path. But usually at some point we would visit a vertex already considered, which leads either to exponential complexity (if we go on from this vertex) or to unsound behaviour (if we ignore it).
An $M$-alternating $u$-tree (or alternating tree when there is no ambiguity) is a tree rooted at an uncovered vertex $u$, such that any path between $u$ and a leaf is an $M$-alternating path of even length.
What we described is a (too simple) procedure to grow an alternating tree. An $M$-forest is a forest $F$ of $M$-alternating trees whose roots are exactly the uncovered vertices of $M$. Given a forest $F$, we partition the set of vertices of $F$ in:
We are motivated by the next claim.
Claim 7: If $F$ is a maximal $M$-alternating forest such that $G$ does not contain an outer-outer edge, then $M$ is maximal and the set of inner vertices is a Tutte set.
Proof: Let $I$ and $O$ be the sets of inner and outer vertices respectively. Then each outer vertex induces a distinct odd component in $G - I$ by maximality of $F$. Also, the vertices not covered by $F$ induces even components, as they are paired together by $M$. We check the complementary slackness conditions:
Hence $M$ is maximal and $I$ is a Tutte set.
We face two challenges: dealing with outer-outer edges, and growing the tree until it is maximal. Let's answer the latter problem first. Suppose that $u$ is an outer vertex of $F$, and $v$ is adjacent to $u$ but not in $F$. Then $v$ is covered (otherwise it would be a root in $F$), let $vw$ be the incident edge in $M$. As $w$ is not in $F$, $F \cup \{uv,vw\}$ is a bigger $M$-alterning forest. Now we go back to the former problem.
Given a matching $M$, a contractible cycle is a cycle of length $2k+1$ containing $k$ edges in $M$, and such that there is an even $M$-alternating path from the vertex $w$ of $C$ not covered by $M \cap E(C)$ to an uncovered vertex of $M$. This definition derives naturally from Claim 5: if $M$ is a maximum matching, then the condition that there is an alternating path from $w$ to an uncovered vertex implies that $V(C) \subseteq D$. Hence if $M$ is maximal, it induces a maximal matching when contracting $C$
Claim 8: If an $M$-alternating forest $F$ contains an outer-outer edge $uv$, then
Proof: First case, if $u$ and $v$ are in the same tree $T$, then there is an $(u,v)$-path $P$ in $T$, and let $w$ be the unique vertex in $P$ closest to the root of $T$. then $P$ can be decomposed into two $M$-alternating even paths: $P_u$ from $u$ to $w$ and $P_v$ from $w$ to $v$. Then $C := P_v,uv,P_u$ is an $M$-alternating odd cycle (where $w$ is the vertex not covered by the restriction of $M$ to $C$), that is $C$ is contractible.
Second case, if $u$ and $v$ are in distinct trees $T_u$ and $T_v$. Let $P_u$ be the path from the root $r_u$ of $T_u$ to $u$, and $P_v$ be the path from $v$ to the root $r_v$ of $T_v$. Then $P:= P_u,uv,P_v$ is an $M$-alternating odd $(r_u,r_v)$-paths, and $r_u$ and $r_v$ are not covered.
Reformulation Claim 5, we have:
Claim 9: If $M$ is a maximum matching and $C$ a contractible cycle for $M$, then $M /C$ is maximal in $G / C$, and a Tutte set $T$ of $G / C$ is also a Tutte set in $G$.
Proof: It remains to prove that a Tutte set $T$ of $G / C$ extends to a Tutte set in $G$. Let $w$ be the contraction of $C$ in $G / C$. Then $w \notin T$ because by definition of contractibility $w$ is not covered by at least one matching of cardinality $|M|$, i.e. $w \notin C$. It follows that $T$ is a Tutte set of $G$ as $q_g(T) = q_{G / C}(T)$ (because $C$ has an odd nomber of vertices).
It remains one case in the algorithm: when $M$ is not maximal and there is a contractible cycle. Then $G / C$ is not maximal and hence it has an augmenting path.
Claim 10: Let $M$ be a matching in $G$, $C$ a contractible cycle for $M$, $w$ the contraction of $C$. Suppose that there is an $M / C$-augmenting path $P'$, then there is an $M$-augmenting path $P$ in $G$ with $E(P) \subseteq E(P') \cup E(C)$.
Proof: If $P'$ does not contain $w$, it is an alternating path in $G \setminus C$, thus in $G$ too.
Suppose $w$ is in $V(P')$. Let $u$ and $v$ be the endpoints of $P'$. Then there are only two possible $(u,v)$-paths with edge sets contained in $E(P') \cup E(C)$, inducing respectively a subpath $Q_1$ and $Q_2$ in $C$, and one of their extremity is the vertex not covered by $M \cap E(C)$. Assume that $Q_2$ is even. Then $Q_2$ is alternating and $E(P') \cup E(Q_1)$ induces an $M$-alternating $(u,v)$-path.
We summarize our discussion by the following algorithm. Given a graph $G$, a matching $M$, and an $M$-alternating forest $F$:
We just proved:
Theorem 11: (Edmonds) Maximum Cardinality Matching is solvable in polynomial-time.
Remark that, even without considering details of implementation, our algorithm is inefficient when it finds a contractible cycle: a better algorithm would not start with an empty forest, but instead contract the current forest for $G$ into a forest for $G / C$. But this algorithm is possibly the simplest to understand and prove for this problem.
Let's apply this and write an real algorithm in a real programming language. But first, a disclaimer:
Remark 12: The following implementation is not optimal. It is not optimal in length. It is not optimal in asymptotic complexity. It is not optimal in practical complexity. It is not optimal in clarity. It is probably not even Pareto-optimal for any combination of these criterions. It is just a literal implementation of the proof given above.
The implementation is given in OCaml, and is part of my own graph library. The interface for graphs is given here. I also use a module for paths, that implements a double-ended queue with fold and concatenation functions, complete interface here. The advantage is that when something is supposed to be a path, it is indeed a path for sure (I suppose that the implementation for path is correct, which is reasonable as it is straightforward, and it can be checked independantly). Notice that a path is defined over a fixed graph, hence a sequence of edges that is a path in two distinct graphs needs two representations.
Similarly, I define a module for matching. Again, a matching is defined for one fixed graph, and a matching is really what we meant it to be. The definition of a matching is given by:
phi is simply a partial function that gives for every vertex covered the vertex to which and the edge by which it is matched.
The interface is here. There is only one non-obvious function here:
This function takes a matching and a path representing the cycle of a blossom, and contracts the blossom (hence returns a matching on a smaller graph). Here is the implementation that I give:
check_blossom checks that the operation is legal. Actually, my implementation is not correct because I don't check that the graphs on which m and path are defined are the same. It would have been cleaner to actually define a module just for alternating paths, defined for a given matching, that would have almost the same interface than a normal path (and it would have provided a nice example of recursive modules). Another possible choice would be not to include that function in the matching interface. The implementation is straightforward here: we define w the vertex at the top of the blossom, and compute the contracted graph (line 7): we identify the vertices of the blossom with w and then remove loops and multiple edges around w. vset is the set of vertices in the blossom, that we use to define the contracted matching. Notice that w is still covered by the same edge after the contraction.
Now we have all that we can desire to write our implementation of Edmonds' algorithm. The interface will be:
We keep the structure of the proof: we give Edmonds' algorithm as an algorithm taking a matching and computing a certificate of optimality or an augmenting path. The algorithm is further decomposed into an algorithm that given a forest, computes an augmenting path, a blossom, or a larger forest. For completeness, we give a function maximum_matching that computes directly a maximum matching and a Tutte set from any matching, even empty.
The type forest is coded like this:
links classifies the vertices as inner, outer or away from the forest. Notice that an outer vertex or an inner vertex points to the edge toward its father in the forest. We also keep a set of edges, unscanned (implemented as a set, but could have been a list for better performance), having one outer endpoint but not being yet in the forest. We must check all these unscanned edges for augmenting paths, blossoms or for a way to grow the forest.
The heart of the algorithm is the function augment_forest, implemented like this:
This implementation emphasizes the four different cases happening when one wants to add an edge to te tree. If the two endpoints $u$ and $v$ are outer, there is a blossom or an augmenting paths. To distinguish the two cases we use the function get_path to get the paths from $u$ and $v$ to the roots of their respective trees. If the roots are equal we have a blossom, in which case we must remove the common edges of the two paths (using remove_common_suffix, otherwise we get an augmenting path by a simple concatenation.
If one endpoint is outer, and the other one is away, the forest grows. We factorize the code for growing in the function grow. In every other case nothing happens: outer--inner does nothing, and the other cases cannot happen as one of the endpoints must be outer.
The code for growing a forest is:
We simply add two edges to the forest, and add all the edges incident to the new outer vertex to unscanned, except the edge being scanned.
Now we can code the one-step algorithm, that augments by one the size of a matching or computes a Tutte set. This is the function edmonds
The implementation follows the proof: if all the edges incident to an outer node are scanned, the set of inner vertices is a Tutte set certifying the optimality. Otherwise, augment_forest finds either an augmenting path (we are done), or the forest grows (we simply recurse), or it finds a blossom (we recurse on the contracted graph). In the last case, if we find a Tutte set in the contracted graph, we proved that it is a Tutte set in the original graph. If we find an augmenting path, then we need to uncontract that path, this is the most technical part of the algorithm:
This is ugly. We need to cut path into two subpaths, and insert an even-length subpath of the blossom in-between, which is done by also cutting the blossom into two pieces (at vertices $w$ and $w'$). And because path lives in the contracted graph, we must rewrite it as a path in the original graph (lines 6 and 8).
Finally, the function to compute a maximum matching:
The complete implementation is available here
1. | Paths, trees, and flowers. Canadian Journal of mathematics, 17, 449 -- 467. (1965). | :
2. | Matching theory. AMS Chelsea Publishing, (1986). | , :