Network Solutions
Suppose you are interested in the political leanings of members of a social network, such as Facebook. You happen to know of some members who are die-hard conservatives, and some others who are liberals. You don’t know anything about the rest of the members, but you figure that there is a general tendency for the political views of “friends” in the network to be correlated. Given that each person typically has many friends, who may have quite different political views from each other, is there a way to make a reasonable guess for the political positions of the members you don’t know about?
Or, imagine that you have a network in which the connectors between nodes are rubber bands (that is, ideal rubber bands that can stretch as long as you like and will contract down to a point if allowed to). If you were to nail down some of the nodes on a flat surface, to what positions would the rubber bands pull the other nodes?
Or, suppose that you have an electrical network of resistors, and you attach the positive terminal of a battery to one node and the negative terminal to another. How will current flow through the network?
While these three questions may seem quite different at first glance, they each depend in a similar way on the structure of the underlying network (or graph, as mathematicians put it—not to be confused with a bar graph or the graph of a function). For example, if you are a Facebook member who belongs to a cluster of friends who all know each other, but you also have some far-flung friends in very different walks of life, your political views will likely reflect some complex interplay between these varied influences. Likewise, a node in the rubber-band network will feel the tug of its local cluster, if it belongs to one, but also its connections to other realms of the network. And an electrical current will instantly make the best possible use of both a multiplicity of local wires and the shortcuts to other parts of the circuit that crop up here and there.
As we’ll see, under the hood these three questions are essentially the same question. They each amount to solving a system of linear equations whose matrix of coefficients, called the graph’s Laplacian, distills the essence of the graph’s characteristics—whether the graph naturally breaks down into clusters, for example, and how important particular edges are to its global structure. These systems of linear equations, called Laplacian linear systems, are ubiquitous in many areas of computational science, including engineering simulations and image processing.
A graph’s Laplacian linear system is simple to write down, and, in theory at least, is easy to solve—after all, students learn as early as high school algebra how to solve systems of linear equations. But when it comes to solving Laplacian linear systems, the gulf between theory and practice looms large.
The basic technique students learn, called Gaussian elimination, is not exactly a quick process. For many of the networks researchers would most like to analyze today—such as Facebook, which has more than 800 million members—Gaussian elimination is much too slow to be practical.
By being willing to accept approximate solutions to linear systems instead of exact solutions, computer scientists have succeeded somewhat in improving the running time of their Laplacian-solving algorithms. But until recently, most of this improvement had come from designing customized equation solvers that work well only for particular special cases, such as graphs that can be drawn on a flat plane without any edges crossing. No one had come up with a practical algorithm that could solve every kind of Laplacian system you might care to throw at it.
Now, a body of work 20 years in the making has changed all that. Building on a flash of insight by a young professor two decades ago—an idea at first overlooked by most researchers and nearly lost to academia—mathematicians and computer scientists in the past few years have come up with fast algorithms to find good approximate solutions to any Laplacian linear system. The algorithms open up the prospect that a desktop PC will soon be able to (approximately) solve Laplacian systems with billions of variables in a matter of minutes.
The researchers “have ignited what appears to be an incipient revolution in the theory of graph algorithms,” write Michel Goemans and Jonathan Kelner of the Massachusetts Institute of Technology in the October 2011 Notices of the American Mathematical Society.
Already, computer scientists are applying the latest (and fastest) generation of these algorithms to tackle medical imaging problems that have resisted solution by other methods. Researchers have begun exploring the technique’s potential for improving online recommendation systems, web spam detectors, and other machine learning applications. And they have used the Laplacian solver to generate the first speed-up in more than 10 years in the solution to one of the most basic problems in logistics.
“By developing and exploiting deep connections between graph theory and computational linear algebra, [the researchers] have introduced a new set of tools that have the potential to transform both fields,” Goemans and Kelner write.
The Laplacian
To get a sense of how the Laplacian creates a bridge between linear algebra and the structure of graphs, let’s dive into a brief tutorial, focusing on one example in detail: the case in which we are nailing down a collection of nodes in a network of rubber bands. For simplicity, let’s assume for now that all the rubber bands have the same degree of stretchiness (or “spring constant”), and that we are restricted to nailing down nodes along the x-axis (instead of the entire plane). All the non-nailed nodes will instantly spring to the positions along the x-axis that minimize the total potential energy of the graph as a whole.
The potential energy of any one of the rubber bands is proportional to the square of its length. So if the nodes of the network are located at positions \(x_{1}, x_{2}, \ldots, x_{n}\) along the \(x\)-axis, then a rubber band that connects nodes \(i\) and \(j\) has \((x_{i} – x_{j})^{2}\). Thus, the total potential energy of the network is proportional to $$ \textstyle\sum (x_{i} – x_{j})^{2}, $$ where the sum is taken over all the rubber bands in the network.
For readers who remember how matrix multiplication works, it’s not too hard to check that there is a particular coefficient matrix \(L\)—called the graph’s Laplacian—for which the above sum can also be written as the matrix product \(\boldsymbol{x}^{T}\boldsymbol{Lx}\), where
$$ \boldsymbol{x}=\begin{bmatrix} x_1\\ \vdots\\ x_n \end{bmatrix}. $$
(The Laplacian matrix is named after the late-18th-century French mathematician Pierre-Simon de Laplace, who applied a continuous version of this system to the study of celestial mechanics.)
For example, suppose we take the simple graph
If the three nodes are located at positions \(x_{1}\), \(x_{2}\) and \(x_{3}\) along the \(x\)-axis, the total potential energy of the system is $$ \begin{aligned} (x_1-x_2)^2 + (x_1-x_3)^2 &= 2x^2_1+x^2_2+x^2_3-2x_1x_2-2x_1x_3\\ &=\begin{bmatrix} x_1 &x_2 &x_3 \end{bmatrix} \begin{bmatrix} ~2&-1&-1\\ -1&~1&~0\\ -1&~0&~1 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3 \end{bmatrix}. \end{aligned} $$
Examining the entries of this Laplacian matrix, you might notice certain relationships to the original graph. Namely:
- Each diagonal entry \(a_{ii}\) in the matrix is equal to the number of edges emanating from node \(i\) in the graph.
- Each off-diagonal entry \(a_{ij}\) is equal to \(-1\) if nodes \(i\) and \(j\) are connected by a rubber band, and 0 otherwise.
It’s not too hard to show that these properties are always true of a graph’s Laplacian; thus, they provide a simple recipe for writing down the Laplacian of any graph. For example, for the graph
the Laplacian is
$$ \begin{bmatrix} ~2&-1&-1&~0\\ -1&~2&~0&-1\\ -1&~0&~1&~0\\ ~0&-1&~0&~1 \end{bmatrix}. $$
Returning to our graph of rubber bands, to figure out the positions of the nodes along the \(x\)-axis, we must find the values of \(x_{1},\ldots, x_{n}\) that will minimize the potential energy, \(\boldsymbol{x}^{T}\boldsymbol{Lx}\), while keeping the nailed-down nodes in their fixed positions. The gradient of \(\boldsymbol{x}^{T}\boldsymbol{Lx}\) is simply \(2\boldsymbol{Lx}\), so minimizing \(\boldsymbol{x}^{T}\boldsymbol{Lx}\) boils down to solving a linear equation $$ \boldsymbol{L}^{\ast }\boldsymbol{x=b}, $$ in which \(\boldsymbol{L}^{\ast }\) is identical to \(\boldsymbol{L}\) except for some small tweaks in the rows corresponding to the nailed-down nodes, and \(\boldsymbol{b}\) is a vector whose entries are all zero except for the ones that correspond to nailed-down nodes. (If you’d like to see exactly how \(\boldsymbol{L}^{\ast }\) is constructed, see the sidebar, Nailing Down Nodes: the Not-Too-Gory Details.) The above equation is called a Laplacian linear system.
(For a graph of rubber bands that do not all have the same spring constant, the analysis is the same, except that the terms in the potential energy formula are weighted by the corresponding spring constants. Likewise, the simple recipe for the Laplacian matrix is adjusted to incorporate these spring constants, and the resulting matrix is called a weighted Laplacian.)
We’ll look into the other examples we’ve discussed—the politics of Facebook members, and the behavior of an electrical network—in more detail later on. As we will see, they are governed by the same types of equations as the network of rubber bands. Solving Laplacian systems allows us to solve the Facebook political problem, the rubber band problem, and the electrical network problem in one fell swoop. And that’s just the beginning of what the Laplacian has to offer: a host of other problems, some of which we’ll encounter later, have a Laplacian linear system at their core.
The linear algebra of a Laplacian matrix offers a rich array of insights into the structure of the corresponding graph, and vice versa. For example, let’s look at the linear system \(\boldsymbol{Lx=0}\). It’s easy to see that in addition to the zero vector, any constant vector (that is, a vector whose entries are all the same) is a solution to this equation; this simply corresponds to the fact that if we allow all the nodes of the network to contract to the same spot on the \(x\)-axis, the total potential energy will be zero.
What’s more, if the graph is not connected then there will be other families of vectors that satisfy \(\boldsymbol{Lx=0}\), since each connected component of the graph could be in a different spot on the x-axis and the total energy would still be zero. It’s not too hard to see that the reverse is also true: If there are non-constant vectors that satisfy \(\boldsymbol{Lx=0}\), then the graph must be disconnected.
This relationship between connectivity of the graph and solutions to Laplacian systems can be taken even further. If a graph is almost disconnected—that is, if the graph could be disconnected into two separate clusters by cutting just a few rubber bands—then nailing down each cluster in one spot will result in just a few rubber bands getting stretched, and the total potential energy \(\boldsymbol{x}^{T}\boldsymbol{Lx}\) of the system will be fairly low. Mathematicians have shown that this case corresponds roughly to there being a (non-constant) vector \(\boldsymbol{x}\) for which \(\boldsymbol{Lx}\) is a very small multiple \(\lambda \) of \(\boldsymbol{x}\). In fact, knowing the values of \(\boldsymbol{x}\) and \(\lambda \) gives mathematicians the means to estimate just how many rubber bands they would have to cut to disconnect the graph into significant clusters, and even helps to target the right rubber bands to cut. As we’ll see, this technique plays a key role in image processing.
Solving a graph’s Laplacian system allows researchers to tap into a wealth of information about the graph’s structure. But, as we’ve discussed, when the graph contains millions or billions of nodes, solving its Laplacian may be no easy matter.
The Art of Preconditioning
In theory, as we’ve discussed, the problem of solving Laplacian systems is already taken care of by Gaussian elimination, the algorithm for solving systems of linear equations that students learn in high school. Gaussian elimination involves adding an appropriate multiple of one equation to another equation in order to eliminate one variable, and then continuing to eliminate variables until the resulting equations take on a form that is particularly easy to solve.
The trouble is that for a graph with \(n\) nodes, Gaussian elimination can take as many as \(n^{3}\) steps. If \(n\) is on the order of hundreds of millions or billions, as it can easily be for, say, a three-dimensional medical image or a social network, then Gaussian elimination—and even many much faster algorithms—are not going to cut it.
“In light of the explosive growth in the amount of data and the diversity of computing applications, efficient algorithms are now needed more than ever,” writes Shang-Hua Teng, a computer scientist at the University of Southern California, in Los Angeles. “What used to be considered an efficient algorithm\dots may no longer be adequate for solving problems of these scales.”
For a graph with \(m\) edges, the number of non-zero entries in the Laplacian matrix is roughly proportional to \(m\), so just reading off the non-zero coefficients of the linear equations will take some multiple of \(m\) steps—what computer scientists call linear time. Clearly, then, no Laplacian-solving algorithm can be faster than linear time. But that would be plenty fast enough for today’s applications. For decades, computer scientists have been wondering: Is there a Laplacian solver out there that runs in linear time?
As yet, no one knows the answer to that question. But as we’ll see, in the past few years, researchers have gotten pretty close. They have succeeded in designing algorithms that come up with highly precise approximate solutions to Laplacian systems in nearly linear time (a concept we’ll define later on).
The new algorithms use an approximation technique known as preconditioning, which has become standard procedure in computer science and numerical analysis for tackling large linear systems of equations. Preconditioning entails replacing the system you want to solve with a new, simpler system of linear equations that is easy to solve, but is similar enough to the original system that its solutions are a good approximation of the original system’s solutions. Preconditioning can only ever give an approximate solution, but if the “preconditioner” is chosen wisely, that approximation can be quite good.
To get a sense of how preconditioning works, let’s use it to try to come up with a good approximate solution to the linear equation \(24x=101\), using a rather rudimentary computer: the conscious human brain. Our computer is very slow when it comes to long division, so instead of using that technique to get an exact answer, we could instead start by noticing that the equation \(25x=100\) is much easier to solve, and we could use that observation to make an initial guess that \(x\) is approximately 4.
To check how good an approximation this is, we could calculate that \(24\times 4=96\); so we’ve undershot the right-hand side of the equation by 5. The real answer is 4 plus the solution to \(24y=5\). We don’t want to do long division on that equation either, so we can again precondition and solve \(25y=5\) instead. This tells us that \(y\) is approximately 0.2, which makes our new guess for the original equation into \(x=4.2\). Now, \(24\times4.2=100.8\), so we’re getting pretty close to the correct answer. If we wanted to get even closer, we could do another round of preconditioning. (The exact answer is \(x=4.208333\dots\).) Notice that in each round, we are solving an easy equation and then using our original equation just to check how good the approximation is—a much easier matter than actually solving the original equation.
In general (and in this example), there are two attributes that go into a successful preconditioner:
- The preconditioner must be easy to solve.
- The preconditioner must be similar, in some key way, to the original system. After all, if we had tried to precondition \(24x=101\) with the equation \(10x=100\) instead of \(25x=100\), then solving the preconditioned equation would be easy, but the solution, \(x=10\), wouldn’t provide us with very useful information about our original equation. We could try, as we did in the example above, to improve our estimate by doing more rounds of preconditioning, but it might require many rounds for our estimate to converge to the right answer (or it might never converge at all).
There’s a trade-off inherent in choosing a preconditioner. If the preconditioner is too much like the original system, it will probably be too hard to solve. But if the preconditioner is not enough like the original system, it will provide poor estimates and it will take too long (or perhaps infinitely long) for the estimates to converge with the desired accuracy. Striking the right balance between these competing requirements is something of an art, and it depends very much on the attributes of the particular system we are trying to solve—no one has ever figured out a one-size-fits-all approach to preconditioning.
“Finding good preconditioners is one of the most important problems in applied mathematics,” says Erik Boman, of Sandia National Laboratories in Albuquerque.
Preconditioning often involves replacing the least important coefficients in the system of equations with zeros to make a new system that is easier to solve. For example, suppose we’d like to solve the system: $$ \begin{aligned} 100x + .02y &= 200\\ .01x + \phantom{.}50y &= 100 \end{aligned} $$
The \(.02y\) and \(.01x\) terms are practically negligible, so a good preconditioner would be the (very easy!) system $$ \begin{aligned} 100x &= 200\\ 50y &= 100 \end{aligned} $$
In matrix terms, we are preconditioning the coefficient matrix $$ \begin{bmatrix} 100& .02\\ .01& 50 \end{bmatrix} $$ with the matrix $$ \begin{bmatrix} 100& 0\\ 0& 50 \end{bmatrix}, $$ a process known as diagonal, or Jacobi, preconditioning.
Or, if we are trying to solve $$ \begin{aligned} 100x + 100y &= 300\\ .01x + \phantom{0}50y &= 100 \end{aligned} $$ we can precondition it with the system $$ \begin{aligned} 100x + 100y &= 300\\ 50y &= 100 \end{aligned} $$ This kind of preconditioning, which approximates the matrix $$ \begin{bmatrix} 100&100\\ .01&50 \end{bmatrix} $$ by the matrix $$ \begin{bmatrix} 100& 100\\ 0& 50 \end{bmatrix}, $$ is called upper-triangular, or Gauss–Seidel, preconditioning.
When it comes to a Laplacian matrix, neither Jacobi nor Gauss–Seidel preconditioning does the trick. Recall that the diagonal entries of the Laplacian encode the number of connections leaving each node, while the off-diagonal entries encode the details of which nodes are connected to which. Jacobi preconditioning, which keeps only the diagonal entries, throws away so much information about how the graph is connected that it is not able to produce good approximate solutions. Nor does Gauss–Seidel preconditioning make sense for a Laplacian matrix: since a Laplacian is symmetric, discarding the lower-triangular entries would significantly distort its structure, resulting in poor estimates.
For decades, it was unclear what approach to preconditioning might work for Laplacian matrices. Most researchers focused on preconditioning particular subtypes of Laplacian matrices that arose in applications, by exploiting the specifics of the scenario under study.
“The whole area of preconditioning Laplacian matrices was a very heuristic, seat-of-the-pants kind of area,” says John Gilbert, of the University of California, Santa Barbara.
Twenty years ago, however, the seeds of change were planted when a young professor at the University of Illinois at Urbana-Champaign named Pravin Vaidya realized that the key to a good Laplacian preconditioner lies hidden in the structure of the underlying graph.
“Vaidya had this crazy insight that all this numerical linear algebra stuff could be abstracted into the language of graph theory,” says Bruce Hendrickson, of Sandia National Laboratories. “It was one of those moments of genius where you don’t know where it came from.”
Preconditioning a Graph
Recall that the Laplacian matrix captures (among other things) the elastic properties of a graph of rubber bands. Vaidya’s insight, though not originally expressed in the language of rubber bands, loosely boils down to the following idea: To precondition the Laplacian \(\boldsymbol{L}_{G}\) of a graph \(G\), use the Laplacian \(\boldsymbol{L}_{H}\) of a subgraph \(H\) that keeps those rubber bands that play the most important role in \(G\)‘s elastic structure, and discards most of the rest.
For this to work, \(\boldsymbol{L}_{H}\) has to be both similar to \(\boldsymbol{L}_{G}\) and easy to solve. Roughly speaking, the solutions to \(\boldsymbol{L}_{H}\boldsymbol{x=b}\) and \(\boldsymbol{L}_{G}\boldsymbol{x=b}\) are the positions on the \(x\)-axis that minimize the total potential energy of \(H\) and \(G\), respectively, so in this context, “similar” means rubber-band similarity: for any choice of positions for the nodes of \(G\), the total potential energy of \(H\) should be close to the total potential energy of \(G\). In other words, for all vectors \(\boldsymbol{x}\), the ratio \((\boldsymbol{x}^{T}\boldsymbol{L}_{G}\boldsymbol{x})/(\boldsymbol{x}^{T}\boldsymbol{L}_{H}\boldsymbol{x})\) should be close to 1.
The other half of the picture is that \(\boldsymbol{L}_{H}\boldsymbol{x=b}\) must be easy to solve. Any time we remove an edge from \(G\), the corresponding entries in the Laplacian matrix become zero, and as a rough rule of thumb, the more entries we zero out, the easier the system will be to solve. Thus, the preconditioning balancing act is to drop as many edges as we can from the network while maintaining its potential energy as close as we can to that of the original graph. In other words, we want to snip as many rubber bands as we can without making any snips that cause the graph to spring into a drastically different shape. (Researchers after Vaidya added the refinement of adjusting the weights of the edges in the subgraph: when we cut a rubber band, tightening up the spring constants of neighboring rubber bands helps to compensate for the loss, so that \(\boldsymbol{x}^{T}\boldsymbol{L}_{H}\boldsymbol{x}\) will remain close to \(\boldsymbol{x}^{T}\boldsymbol{L}_{G}\boldsymbol{x}\).)
Vaidya actually approached the problem from the other direction: instead of snipping rubber bands until the resulting Laplacian was easy to solve, he started with a subgraph that was easy to solve, then added edges back in until the resulting subgraph was similar enough to the full graph to be a good preconditioner.
Specifically, Vaidya started with a subgraph called a spanning tree of \(G\). A tree is a graph that contains no closed loops; in other words, there’s exactly one path connecting any two nodes in the tree. A spanning tree of \(G\) is simply a tree subgraph that contains every node of \(G\)—in effect, it’s like a spine of \(G\).
The Laplacian system of a tree is quite easy to solve. To get a sense for why this is true, consider the “leaves” of a tree: the nodes that are connected to only one other node. When you nail down some of the nodes of a tree, the leaves will instantly contract to the positions of their neighbors, since there is nothing to pull them away (except if a leaf happens to be one of the nailed-down nodes). Many of the leaves’ neighbors will also contract to theirneighbors; in fact, the only nodes that will feel a tug in more than one direction are the nodes that lie on a direct path between nailed-down nodes, resulting in a much simpler calculation than in your typical Laplacian system.
Thus, a spanning tree of \(G\) satisfies one half of what it takes to be a good preconditioner: it’s easy to solve. However, its elastic properties are too different from those of \(G\) to satisfy the other half. When Vaidya tried to use spanning trees as preconditioners, his Laplacian-solving algorithm was no faster than a classical algorithm called Conjugate Gradient that dates back to the 1950s.
To improve on his preconditioners, Vaidya tried to add back to the spanning tree some of the most important edges, from an elasticity point of view, while introducing as few closed loops as he could get away with (so that the Laplacian would still be easy to solve). Using a somewhat complex criterion to determine each edge’s importance, Vaidya was able to bring the running time of his algorithm down to about \(m^{1.75}\) (where \(m\) is the number of edges); for many of the graphs researchers are interested in, this is an improvement on Conjugate Gradient, which sometimes has a running time more like \(m^{2}\).
“It was not, honestly speaking, a huge improvement, but it was a darn good start,” says Daniel Spielman, of Yale University. “It was the first nontrivial idea I had seen in this business since Conjugate Gradient.”
Vaidya’s approach also offered something that was lacking in heuristic Laplacian-solvers: a way to prove rigorous bounds on running times.
“It’s a very disciplined, careful way to throw out non-zero entries in the matrix,” Hendrickson says. “What’s different about this regime is that we can prove theorems about how well it works.”
It was clear that there was plenty of room to push Vaidya’s method further to get faster Laplacian solvers. Yet despite the potential of Vaidya’s approach, at first his work was largely ignored by the research community, perhaps because his algorithm offered such a modest improvement over Conjugate Gradient. Vaidya’s paper on this work was never accepted for publication, and Vaidya himself left academia in order to form a software company to sell linear systems solvers.
“I think people didn’t really understand how stunning and creative it was,” Hendrickson says. “The idea was almost lost to the scientific community.”
Fortunately, before he left academia Vaidya gave a handful of talks on his work, and a few audience members pricked up their ears. In the years that followed, they began circulating the draft of Vaidya’s paper, and wrote a series of papers explicating and extending Vaidya’s approach.
Gradually, says one of these researchers, Gary Miller of Carnegie Mellon University in Pittsburgh, a “Laplacian mafia” formed—a tight circle of mathematicians and computer scientists who were convinced that Vaidya’s idea was a diamond in the rough. Then in 2003, two members of this “mafia”—long-time collaborators Spielman and Teng—developed Vaidya’s approach into an algorithm that got everyone’s attention.
A Tour de Force
A key component of Vaidya’s method is determining which edges are important enough to add back onto the spanning tree. It’s clear that certain edges in the graph are more important than others. For example, if you cut a rubber band that connects two distant realms of the graph, then the graph is likely to spring into a very different shape. By contrast, if you cut a rubber band that belongs to a large local cluster of rubber bands, the other rubber bands in the cluster will probably hold the graph in roughly the same configuration as before.
Thus, edges that connect one cluster to another are more important than edges that live inside a cluster. At the same time, if we blithely cut through every rubber band in a cluster, we’ll probably make a huge change in the graph’s configuration. Spielman and Teng realized that what’s called for is a subgraph that includes most of the important edges, as well as enough of the unimportant edges that the clusters still feel clustery.
To accomplish this, Spielman and Teng developed a fast algorithm to identify (roughly) the clusters in a graph. Drawing on research in random walks, they showed that if you walk randomly around the graph, then once you hit a cluster, you are likely to continue to explore the nodes in that cluster, rather than happening on the few edges that will lead you out of the cluster and into the graph at large.
Using their cluster-identifying algorithm, Spielman and Teng were able to create preconditioners that could solve Laplacian systems dramatically faster than Conjugate Gradient. Their algorithm runs in an amount of time that is nearly linear in the number of edges \(m\) of \(G\), meaning that the running time is proportional to \(m\) times some power of \(\log(m)\). For large values of \(m\), \(\log(m)\) is so much smaller than \(m\) that this running time is faster than \(m^{2}\), or \(m^{1.75}\), or in fact \(m\) raised to any power bigger than 1. A nearly-linear running time is the next best thing to a linear running time, which, as we’ve seen, is the fastest a Laplacian solver could possibly be.
“It astounded everyone that they could do this,” Gilbert says. “Until Spielman and Teng, no one thought Vaidya’s ideas would go to a nearly linear algorithm.”
The Spielman–Teng algorithm was “a technical tour de force,” Goemans and Kelner write.
Yet while the Spielman–Teng algorithm was a quantum leap forward from a theoretical point of view, the algorithm was not fast by any real-world standard. Spielman and Teng had proved that their algorithm’s running time was on the order of about \(m(\log(m))^{100}\), and that running time is, indeed, faster than \(m\) raised to any exponent bigger than 1, for big enough values of \(m\). But here, “big enough” means very big indeed. For the applications people care about today, $m$ is seldom more than about a billion. If \(m\) is a billion, then \(\log(m)\) is about 20, and so \(m(\log(m))^{100}\) is much, much bigger than, say, \(m^{2}\). For Vaidya’s graph preconditioners to be useful in real-world problems, researchers were going to have to figure out how to reduce that pesky exponent on the \(\log(m)\) term.
Despite its impracticality, though, the Spielman–Teng algorithm was an important proof of principle.
“I think our paper was the one that convinced people that we should be able to get an algorithm that would get down to a very small power of \(\log(m)\),” Spielman says.
Fast in Practice
Five years later, Spielman and his student Nikhil Srivistava, now at the Mathematical Sciences Research Institute in Berkeley, paved the way toward creating Laplacian solvers that are fast in practice as well as in theory. They realized that when a graph is viewed as an electrical circuit, the question of which edges are most important has a simple answer.
If \(G\) is a network of wires, each with a given resistance, then attaching the positive and negative terminals of a battery to two of the nodes—the electrical equivalent of nailing down two nodes of a rubber-band graph—will generate a current and induce voltages on all the other nodes in the network. Just as the nodes in a rubber-band graph immediately spring into the positions that minimize the graph’s potential energy, the nodes of the circuit instantly assume the voltages that minimize the total energy lost in the form of heat. This energy is calculated by the same kind of formula as the potential energy of a rubber-band graph, so calculating these voltages once again boils down to solving a Laplacian linear system.
Intuitively, an edge in an electrical circuit is important if there aren’t any other easy ways for current to run between its two endpoints. Deleting an edge that provides a shortcut between two remote parts of the circuit, for example, will probably alter current flow significantly; by contrast, if you delete an edge within a cluster, the current can simply flow though nearby wires instead.
This intuition can be made exact by the notion of an edge’s “effective resistance”, which is calculated by attaching the terminals of a battery to the two endpoints of the edge, and measuring the total current that flows through the network. The ratio of the amount of current flowing through the edge itself to the amount of current flowing through the whole graph gives a precise measurement of how much of the burden of transmitting current that edge is shouldering, and how much it is leaving to alternate routes.
It’s possible to plug this measure of an edge’s importance into the framework created by Spielman and Teng, to build a preconditioner by using a biased coin toss to decide which edges to add back to the spanning tree; for any given edge, its coin is weighted according to its effective resistance ratio. Thus, the most important edges are the most likely ones to get added in to the subgraph, but some small proportion of the unimportant edges is likely to be included too.
“It’s beautiful work,” Miller says. “The cleverness lies in what the bias of the coin should be.”
Spielman and Srivistava’s work offers an elegant way to streamline the choice of edges for a preconditioner, but unfortunately it has a chicken-and-egg problem: calculating effective resistance involves solving a Laplacian system. So a Laplacian solver that relies on calculating effective resistance can’t offer any speed-up compared to the algorithms already in existence.
What their approach did offer, however, was a new strategy for how to tackle the speed-up challenge: find some attribute of a graph that is somehow akin to effective resistance but easier to calculate.
In 2010, Miller and two colleagues—graduate student Richard Peng, and Ioannis Koutis, now at the University of Puerto Rico, in Rio Piedras—did just that, approximating effective resistance by the notion of “stretch”. Given a spanning tree, the stretch of any edge in the graph is simply the length of the route in the tree connecting the edge’s endpoints. (For a weighted graph, the definition is a bit more involved, but similar.) An edge with high stretch is a significant shortcut with respect to the tree, so stretch gives an estimate of an edge’s importance.
Stretch is only a crude proxy for effective resistance, since in an electrical network, current will run through all the alternate routes, not just the one in the spanning tree. But precisely because of its crudeness, stretch is fast and easy to calculate. When Koutis, Miller and Peng used coin tosses biased by stretch instead of effective resistance (together with some technical modifications to the graph), they were able to bring the algorithm’s running time down to about \(m\log(m)\)—nearly linear from a practical standpoint as well as a theoretical one. As we’ve seen, when \(m\) is a million or a billion, \(\log(m)\) is so small that it can pretty much be disregarded. If \(m\) is one billion, for example, an \(m\log(m)\) running time is about 50 million times faster than an \(m^{2}\) running time.
“Maybe the latest \(m\log(m)\) version has finally cracked the barrier to where it would win for Joe software designer,” Gilbert says.
Nevertheless, it may be years before these methods are used widely in scientific computing applications.
“It takes a long time for theory actually to make it into the guts of computational science,” Gilbert says. “There’s a whole layer of figuring out how you get these things to work in practice, what engineering compromises you have to make, where you have to approximate, and so on. That work is yet to be done.”
Some practitioners may be reluctant to put in the work entailed in digging their current Laplacian solvers out of their software and figuring out how to integrate the new solver, especially since many of the heuristic solvers seem to perform quite well on the particular problems they are designed for.
“For many of the Laplacian systems that come up in practice, there are other very good solvers,” says Sivan Toledo, of Tel Aviv University. “It could be that in two to five years they’ll all switch to the new solver, but it’s hard to know ahead of time whether it will perform better.”
Yet the heuristic solvers aren’t able to solve every problem out there. In the past few years, the “Laplacian mafia” has started turning its attention to practical questions for which the new paradigm has the potential to offer real answers.
An Atypical Success
Miller and Koutis, together with David Tolliver—currently at ObjectVideo, an image processing company in Reston, Virginia—and researchers at the University of Pittsburgh Medical Center, have started turning the power of their Laplacian solvers on the challenge of medical image analysis. Automated analysis of medical images has the potential to quickly make a wealth of clinical data available to doctors to aid their diagnoses. Yet medical imaging produces enormous data sets—a three-dimensional scan can easily generate hundreds of millions of pixels. Processing these data sets quickly and reliably has proved a challenge.
Many medical imaging problems involve image segmentation—division of an image into its natural components. This problem is, in fact, a Laplacian problem in disguise: we can turn the image into a graph by connecting each pixel to its neighbors, and assigning higher weight to an edge if its endpoint pixels are similar to each other. Image segmentation, then, boils down to finding a way to break up the graph into different clumps so that the total weight of the edges connecting the clumps is small. As we’ve discussed, the solution to the graph’s Laplacian system gives a road map for how to achieve this.
Many medical imaging programs already have Laplacian solvers built into their software. The trouble is that heuristic Laplacian solvers are usually based on an intuition for what the “typical” case will look like, and such solvers often perform poorly on atypical cases. But in medical applications, it is usually those atypical cases—in which the patient has a disease—for which it is most crucial that the solver work well. The new Laplacian solver, unlike other solvers, is guaranteed to solve all cases in nearly-linear time.
Koutis, Miller and Tolliver have been applying the new solver to the problem of dividing retinal scans into layers.
“The machines that do these scans try to use prior information about what a typical eye looks like, but that works only in cases where things are kind of standard,” Koutis says. “The algorithms sometimes fail miserably when the patient has an eye disease.”
By contrast, the researchers have found that so far, the image segmentations the Laplacian solver comes up with are close in quality to those of human experts—the gold standard for machine image processing.
Graph Guesswork
Another area in which Spielman predicts that the Laplacian solvers will prove useful is the nascent field of “learning on graphs”. Suppose you know something about some of the nodes of a graph—for example, the political leanings of some Facebook members. Can you make a reasonable guess about the other nodes? If political leanings occur on a spectrum from 0 to 1, you can imagine nailing down all the nodes you know, and then seeing where the other nodes settle—in other words, solve the Laplacian.
This approach has the potential to be useful, for example, for online recommendation systems, such as the one Netflix uses to suggest movies to users based on what it already knows about their likes and dislikes. This problem, unlike the Facebook politics problem, does not come equipped with an underlying “friendship” graph; however, it’s possible to build a graph by linking users who have enjoyed the same movies, and weighting the edges according to how much their tastes match. Then, if some of the users have rated a particular movie from one to five stars, you can predict how many stars the movie would get from other users by (virtually) nailing down the nodes you already know about at positions on the \(x\)-axis corresponding to their ratings, and then calculating the positions that the other nodes will spring to.
Spielman test-drove this approach, in fact, in the early days of the Netflix Prize, a competition run by Netflix a few years ago that offered a million-dollar prize to anyone who could improve its online recommendation system by at least 10%.
“I was on the leader board very early on,” Spielman recalls. “Within a couple of months, though, other teams took over who were devoting substantial time to the problem.”
None of those recommendation-system heavyweights appear to have picked up on Spielman’s Laplacian approach, so the method’s full power remains to be ascertained.
Researchers are also starting to explore the use of Laplacian solvers for “semi-supervised” learning tasks, in which human experts label some nodes of a graph, and then the goal is to make a good guess for the values of the other nodes. One such example in which Laplacian solvers have been proving useful is in speech recognition: given a speech sample, humans label a small proportion of the words, and then the computer guesses the rest. Researchers are also testing this approach for web spam detection.
“It’s easy to get a giant graph of websites that all link to each other, and to get people to label some of the websites as spam or not spam,” Spielman says. “But there are many more websites than you can pay people to look at, so semi-supervised learning is the way to go.”
It’s too early to say yet whether the fast Laplacian solvers could be a “killer app” for recommendation systems or semi-supervised learning, Spielman says. “Learning on graphs is a young area,” he explains.
Going with the Flow
Vaidya’s paradigm turned the power of graph theory on the problem of solving linear systems of equations. Now, the fast Laplacian solvers that emerged from his approach are returning the favor.
“The Laplacian solvers are a powerful new tool to probe the structure of a graph,” Kelner says.
In the past few years, researchers have begun using this new tool, most notably bringing it to bear on the “Max Flow” problem, one of the oldest problems in the study of algorithms. The Max Flow problem concerns the logistics of sending material through a network of, say, roads or cables or airplane routes. Max Flow asks: given a graph in which each edge has a fixed capacity, how much material can flow through the graph from a ‘source’ node to a ‘sink’ node in one unit of time? Max Flow was first studied during the planning for the Berlin airlift in World War II, and it has applications in airline scheduling, internet data flow, transport of goods, and a host of other industrial processes.
The traditional approach to solving Max Flow is a very combinatorial, piece-by-piece process. It starts by finding one path from the source node to the sink node and sending as much material along that path as possible. Then the algorithm picks another path, and if it has unused capacity, flows material through that path. The difficulty with this approach is that since different paths may share some edges, the degree to which each path is utilized may depend on the order in which the paths are chosen. Sometimes, to improve the flow, it’s necessary to undo the flow on a previous path to allow a new path to be exploited fully.
For decades, the Max Flow problem was an active and fruitful area of research, as computer scientists came up with one clever trick after another for how to choose the order of the paths to speed up the algorithm. Eventually, however, this approach reached a plateau: No one could push the running time faster than about \((n+m)^{3/2}\), where \(n\) is the number of nodes and \(m\) the number of edges. For more than 10 years, research into the Max Flow problem stagnated.
Then in 2010, a team from the “Laplacian mafia”—Kelner, Spielman, and Teng, together with Aleksander Madry, currently at Microsoft Research New England in Cambridge, Mass., and Paul Christiano, an undergraduate at MIT—used the nearly-linear Laplacian solvers to create an entirely new and significantly faster algorithm for Max Flow.
They approached the problem by viewing the graph as an electrical network of resistors—whose resistance is initially set to the reciprocal of the edge’s capacity—and calculating how current will flow through the network if the terminals of a battery are connected to the source and sink nodes. The flow generated in this way will probably exceed the capacity of certain edges, while leaving others underutilized. So the algorithm next increases resistance on the edges that are above capacity, and decreases resistance on the edges that are below capacity. It then computes the current for this new network, which will be an improvement over the previous one. Next, the algorithm tweaks the resistances again and recomputes the current flow.
“We showed that we don’t have to do it too many times to get a good answer,” Kelner says.
The new algorithm has a running time proportional to \((n+m)^{4/3}\). While it remains to be seen how quickly the algorithm solves practical examples, in theoretical terms this running time represents a huge speed-up.
In contrast to the combinatorial approach to Max Flow, the electrical approach almost magically solves for the flow on the entire graph at the same time. This advantage is likely to carry over to many other graph problems, Kelner says.
“Instead of looking at one path at a time, you are covering all the paths at the same time and finding a global minimum,” he says. “That’s a powerful thing to leverage in algorithms.”
Already, computer scientists have used the Laplacian solvers to speed up the analysis of a variety of graph attributes and structures, including calculating effective resistance and generating random spanning trees. These examples are just the beginning, Teng predicts.
“Algorithm design is like building a software library,” he says. “Once we have algorithms that can solve a problem in linear or nearly linear time, we can add them to our library of efficient algorithms and use them as a subroutine in designing the next wave of algorithms.”
Whatever emerges from the new paradigm, linear algebra and graph theory are now interlinked in ways that no one guessed before Vaidya’s insight and its development by the “Laplacian mafia.”
“Mathematical advances often come from realizing that two different fields are related,” Hendrickson says. “Suddenly, tools are at your disposal that weren’t available before.
“Vaidya’s insight is a prime example of this phenomenon.”
▀ ▀ ▀
Sidebar: Nailing Down Nodes: the Not-Too-Gory Details
As we’ve seen in the main article, if we nail down some of the nodes of a graph of rubber bands along the x-axis, the other nodes will instantly spring to the positions $x_i$ that minimize the system’s total potential energy, \(E = \sum (x_i – x_j)^2\), where the sum is taken over all the pairs of nodes that are connected by a rubber band.
We saw that \(E\) can also be written as the product \(\boldsymbol{x}^TL\boldsymbol{x}\), where the “Laplacian” matrix \(L\) satisfies the following:
- Each diagonal entry \(a_{ii}\) in the matrix is equal to the number of edges emanating from node \(i\) in the graph.
- Each off-diagonal entry \(a_{ij}\) is equal to \(-1\) if nodes \(i\) and \(j\) are connected by a rubber band, and \(0\) otherwise.
For simplicity’s sake, let’s suppose that we’ve nailed down just two of the nodes, say, node \(x_1\) at position $a$ and node \(x_2\) at position \(b\) (although the following argument would go through in the same way no matter how many nodes we had nailed down).
If the remaining nodes \(x_3, \dots, x_n\) are in the positions that minimize the potential energy \(E\), we must have $$ \frac{\partial E}{\partial x_i} = 0 \quad \text{for } i = 3, \dots, n. $$
So to figure out the positions of the nodes of the network, we must solve the system of equations $$ \begin{aligned} x_1&=a\\[2pt] x_2&=b\\[2pt] \frac{\partial E}{\partial x_3}&=0\\ &\vdots\\ \frac{\partial E}{\partial x_n}&=0 \end{aligned} $$
Now, $$ \frac{\partial E}{\partial x_i}=\textstyle\sum 2 (x_i-x_j), $$ where the sum is taken over all the nodes \(j\) that are connected to node \(i\).
So the system we are trying to solve becomes $$ \boldsymbol{L}^*\boldsymbol{x} = \begin{bmatrix} a\\b\\0\\\vdots\\0 \end{bmatrix}, $$ where
- — the first row of \(\boldsymbol{L}^*\) is \([1 \ 0\ 0\ 0 \ \dots \ 0]\);
- — the second row of \(\boldsymbol{L}^*\) is \([0 \ 1\ 0\ 0 \ \dots \ 0]\);
- — the other rows of \(\boldsymbol{L}^*\) are the same as the corresponding rows of the Laplacian matrix \(\boldsymbol{L}\).(Exercise: convince yourself that that is true.)
For example, consider the graph
whose Laplacian matrix is $$ \begin{bmatrix} ~2&-1&~0&~0&-1\\ -1&~2&-1&~0&~0\\ ~0&-1&~3&-1&-1\\ ~0&~0&-1&~2&-1\\ -1&~0&-1&-1&~3 \end{bmatrix}. $$
If we nail down node 1 at, say, position 4 on the \(x\)-axis, and node 2 at position 8, then figuring out the positions of the other nodes amounts to solving the system $$ \begin{bmatrix} ~1&~0&~0&~0&~0\\ ~0&~1&~0&~0&~0\\ ~0&-1&~3&-1&-1\\ ~0&~0&-1&~2&-1\\ -1&~0&-1&-1&~3 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\\ x_4\\ x_5 \end{bmatrix} = \begin{bmatrix} 4\\ 8\\ 0\\ 0\\ 0 \end{bmatrix}. $$ We can use the first two rows to eliminate \(x_1\) and \(x_2\) from the other equations, to get $$ \begin{bmatrix} 1&~0&~0&~0&~0\\ 0&~1&~0&~0&~0\\ 0&~0&~3&-1&-1\\ 0&~0&-1&~2&-1\\ 0&~0&-1&-1&~3 \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\\ x_4\\ x_5 \end{bmatrix} = \begin{bmatrix} 4\\ 8\\ 8\\ 0\\ 4 \end{bmatrix}. $$ which reduces to solving $$ \begin{bmatrix} ~3&-1&-1\\ -1&~2&-1\\ -1&-1&~3 \end{bmatrix} \begin{bmatrix} x_3\\ x_4\\ x_5 \end{bmatrix} = \begin{bmatrix} 8\\ 0\\ 4 \end{bmatrix}. $$
The above \(3\times 3\) matrix is simply the restriction of the Laplacian matrix to the non-nailed-down nodes; all the techniques described in the article for preconditioning Laplacian matrices apply equally well to such Laplacian submatrices.
The solution to this particular system turns out to be $$ \begin{aligned} x_1 &= 4\\ x_2 &= 8\\ x_3 &= 6.5\\ x_4 &= 6\\ x_5 &= 5.5 \end{aligned} $$ and the nailed-down network looks like this: