#### Mathematics is helping medical students and hospitals solve the two-body problem

The mass entry of women into the American workforce in the 20th century has, proverbially, turned family life into a juggling act, with couples coordinating work schedules, childcare and chores in days that never seem quite long enough.

Imagine, then, how difficult it is for professional couples—such as medical students and academics—who have to relocate just to find jobs in the first place. For these couples, solving the “two-body problem” can be far from trivial. All too often, one member of a couple has to decide about a job offer before the other has lined up any prospects. Do they grab the job and trust that a second job in the same city will turn up? Or do they hold out for a pair of jobs, hopefully in their dream city?

When the pair are medical students, they entrust their fate to a computer algorithm that matches them to hospital residency positions. Prior to Match Day, medical students submit preference lists of acceptable positions, and hospitals rank the students they are willing to hire. Couples are allowed to submit their preferences in pairs, to make sure they don’t end up on opposite ends of the country. Then the computer comes up with a matching of hospitals and students that all the participants have committed to accept.

It would be nice if the computer could give every student, couple and hospital its top choice, but in most cases this is not possible. In the absence of a match that will perfectly satisfy everyone, what constitutes an acceptable match? When the first resident matching algorithm was devised in the early 1950s, there was no consensus about what properties such a match should have. Then, in 1962, in a simple but ground-breaking paper, mathematicians David Gale and Lloyd Shapley came up with the idea of a “stable” matching. In the context of the market for medical residents, a matching is stable if no student and hospital (or couple and pair of hospitals) prefer each other over their computer-assigned matches. Such a matching may not result in everyone being overjoyed with their match, but it is stable in the sense that those participants who are disappointed with their match can’t persuade any of their preferred choices to switch to them.

In markets that have no couples, it’s always possible to find a stable matching, mathematicians have proved. Throw couples into the mix, however, and a stable matching may not exist. In fact, mathematicians have shown that in some cases, even computing whether a stable matching exists is so hard that it is, for all practical purposes, impossible (in technical terms, it is NP-complete).

Yet in the real world, the algorithm has been surprisingly successful at finding stable matchings in markets with couples, and economists have been wondering why. In the last decade, in several dozen annual matching markets—for medical students, psychologists, and a host of medical specialties—only a handful of markets have failed to find a stable matching.

“There’s a sense in which we’ve been more successful than we had a right to be, based on the existing theorems,” says Alvin Roth, an economist at Harvard University who was behind the design of one of the most widely-used matching algorithms today.

Now, Roth and two other economists have proposed an explanation for why the matching markets seem to do so well. In a working paper, Roth—together with Fuhito Kojima of Stanford University and Parag Pathak of MIT—proves that in a large market, provided that there aren’t “too many” couples (in a precise sense), the probability that a stable matching exists is very high. What’s more, there is a simple algorithm that will successfully find a stable match with high probability.

The researchers also show that in these large couples markets, participants have strong incentives to tell the truth about their preferences. As we’ll see, it’s possible to construct examples of markets in which participants could successfully “game the system” by lying about their preferences. But as the size of the market grows, the researchers show, participants have less and less to gain from such chicanery.

“This is a first attempt to explain why the couples problem hasn’t been such a bear,” Roth says.

## State of residency

The first resident matching algorithm was brought on the scene in the early 1950s, in an attempt to clean up a market that had been messy and chaotic for decades.

Because medical residents—aka, cheap labor—were a valuable commodity, in the early 20th century hospitals would try to beat each other to the punch, locking in desirable medical students before other hospitals had made them offers. Gradually, offers to medical students crept earlier and earlier, until by the mid-1940s, jobs were being agreed upon almost two years before the students graduated.

Realizing that this trend was untenable, the medical schools decided jointly not to issue any student transcripts or reference letters before a fixed date in the senior year. This effort helped control the timing of the job market, but created a new problem for hospitals: often, if the first offer a hospital made got rejected, the student to whom it wanted to make its next offer had already accepted another job. Hospitals responded to this difficulty by issuing offers of residency positions at 12:01 am on the earliest possible date, and giving students just a few hours to decide.

By the early 1950s, hospitals and students alike were frustrated enough to agree to a centralized market with jobs assigned by computer. After some trial and error, the National Resident Matching Program (NRMP) was born. Over the next few decades, the system spread to markets for dozens of medical specialties and related fields.

While the matching algorithm was originally designed in an ad hoc manner, over the next few decades mathematicians placed it on a solid theoretical footing. First, in 1962—motivated by a problem related to college admissions, rather than by the resident matching algorithm—Gale and Shapley published their seminal paper about the so-called “stable marriage” problem. Imagine, they wrote, a marriage market with *n* boys and *n* girls. Each boy ranks all the girls in order of preference, and each girl ranks all the boys. Is there any way to pair off the boys and girls, they asked, so that no boy and girl will want to ditch their assigned spouses and elope together? Gale and Shapley defined a match to be “stable” if it was elopement-proof.

Gale and Shapley designed a matching algorithm, called the deferred acceptance algorithm, and proved that it always produces a stable matching. In the “boy proposes” version of the algorithm, the boys all start by proposing to their favorite girls. Each girl looks through her offers (if she has any), holds on to the best one, and rejects the others. In the next round, each rejected suitor proposes to his number two girl; each girl holds her new best offer, possibly rejecting the suitor she held in round one. The algorithm proceeds in this way, with each rejected suitor in a given round proposing to the next girl on his list. (The algorithm could instead be run with the girls doing the proposing.)

To see that the algorithm eventually ends, instead of getting stuck in an infinite loop of proposals and rejections, note that the boys are working their way down their preference lists, so the proposals can’t go on forever. When the algorithm ends, everyone is matched: it’s impossible to have a leftover boy and girl, since the boy would eventually propose to that girl if no one else will have him, and she would accept him if she has no better offers.

And the matching the algorithm produces is stable. To see this, imagine that it has matched Alice to Bob and Carol to Dave, but Carol prefers Bob. Then Bob must never have proposed to Carol, since if he had, she would have accepted him and rejected Dave. Therefore, Alice must be higher than Carol on Bob’s preference list, since he proposed to her before getting to Carol. So Bob will choose not to elope with Carol. A similar argument can be made when it’s the boy who wants out of his match.

In 1984, Roth showed that the resident matching algorithm was equivalent to Gale and Shapley’s deferred acceptance algorithm. (The hospitals and medical students in the NRMP don’t actually go about proposing to and accepting or rejecting each other; the computer simply implements all the proposals and responses, based on the participants’ preference lists.)

The resident matching market does have some important differences from Gale and Shapley’s marriage scenario: Some hospitals may have more than one position, the number of positions may not match the number of applicants, and hospitals and students are not required to rank every single available option.

Despite these differences, Roth used arguments similar to Gale and Shapley’s to show that the algorithm will always find a stable matching.

In contrast with the stable marriage scenario, the match will not necessarily produce a placement for every participant. The fact that there are (typically) more residents than available jobs pretty much guarantees this, but that’s not all that is going on. The “leftovers” when the algorithm terminates will not necessarily be just the bottom-of-the-barrel medical students that no one wanted. There might also be students with glowing resumes, and there are typically also some empty hospital positions.

The reason that many participants get left out of the match stems from what happens before the algorithm does its work. To develop their preference lists, medical students send out resumes and get called for interviews. A given student can travel only to so many interviews, and a hospital can interview only so many applicants, before it’s time to make preference lists. So, even though medical students could theoretically rank several thousand different programs, in practice they typically rank only 7 to 9. Thus, a talented student could get left out of the match if she foolishly listed only a few top programs at which she was outranked by other students. Similarly, a hospital could omit to include any “safe” students on its list, and miss out on a match. Or, a student or hospital might produce a well-balanced list but fail to get a match out of sheer bad luck.

In 2009, for example, about 12% of residency programs had unfilled seats after the official match. These positions got filled in an unstructured process affectionately dubbed the “Scramble,” in which unmatched students and hospitals would work the phones until all the positions were filled.

The algorithm’s inability to match up a substantial chunk of the participants seems at first glance like an unmitigated drawback. But there’s a silver lining.

In their new work, Kojima, Pathak and Roth show that this seeming failure is in fact the reason why the algorithm is successful in markets with couples. As we’ll see, the large number of open positions provides just the wiggle room needed to integrate couples into the match. What’s more, this wiggle room makes it close to impossible for participants in a couples market to successfully manipulate the market by lying about their preferences.

## The two-body problem

In the 1950s, when the centralized residents match was devised, virtually all medical students were men, and the idea of accommodating couples wasn’t even on the radar. By the 1970s, however, enough women were graduating from medical school that couples were not unheard of, and today typically more than a thousand medical students look for positions as couples (in a market with about 25,000 to 30,000 participants).

In the 1970s, the NRMP made a first attempt at an algorithm that could accommodate couples. Couples had to be certified by their dean, and they had to specify one member of the couple as the “leading member.” Each member of the couple made a separate preference list. The leading member went through the match as if single, and then the other member’s list got edited to remove positions in a different community from the leading member’s position.

This algorithm had some rather glaring drawbacks. For one thing, as Roth puts it, the algorithm violated an iron law of marriage: you can’t be happier than your spouse. Furthermore, since the algorithm ignored many of the realities of couples’ preferences, couples were often able to arrange better positions outside of the official match than the algorithm found for them. Gradually, over the 1970s and early 1980s, more and more couples started opting out of the match entirely.

It seems like the obvious solution is to allow couples to submit a single preference list consisting of pairs of positions—Alice at Massachusetts General Hospital and Bob at Boston Medical Center, for example. But doing so opens up a can of worms. In 1984, Roth showed that in such markets, a stable matching may not exist. Problems tend to arise, for example, when a couple gets placed in a pair of positions, and one hospital is happy but the other is not. (For a simple example of a couples market with no stable matchings, see Vicious circle.)

Despite the potential for problems, in 1983 the NRMP adopted an algorithm that allowed couples to rank positions in pairs. Since 1998 it has used an algorithm designed by Roth and Elliott Peranson, of National Matching Services. This algorithm has now been adopted by more than 40 different labor matching markets.

The Roth-Peranson algorithm is based on the “sequential couples algorithm,” which is in turn based on Gale and Shapley’s deferred acceptance algorithm. The sequential couples algorithm starts by running Gale and Shapley’s algorithm on the hospitals and single students, saving couples for later. Next, working one couple at a time, the algorithm places couples into positions (provisionally) by allowing them to apply to pairs of positions, working their way down their preference list until they find a pair of hospitals willing to accept them (potentially displacing two students who had previously been accepted by those two hospitals). Once the algorithm has finished placing all the couples, the displaced students are brought back into the market by the computer and allowed to re-apply to hospitals, working their way down their preference lists.

The sequential couples algorithm can run into difficulties in a variety of ways. For example, once a couple has been placed (say, at hospitals A and B), suppose an individual student applies to hospital A and is preferred over the couple member. Then both members of the couple will be displaced, but only hospital A has improved its lot, not hospital B. Hospital B might now regret the students it had previously rejected in favor of the couple member. What’s more, the couple will have to be placed in a new pair of positions, potentially displacing two new students and setting off an infinite cascade of placements, rejections, and dissatisfied hospitals.

Thus, the sequential couples algorithm is considered to succeed only if, once a couple gets placed, no one else applies to those two positions. If the algorithm succeeds, it’s not hard to show that the match it produces is stable.

The Roth-Peranson algorithm starts with a process similar to the sequential couples algorithm, but if the algorithm falls into a hole along the way, it doesn’t just declare failure, but tries to climb back out. In the above example—in which an individual student displaces two couple members, leaving a dissatisfied hospital—that hospital is put on an “unresolved” pile. After any displaced students have proposed down their lists, the unresolved hospital looks at the student currently assigned to it, and then all the students with whom it might be tempted to “elope” are thrown back into the mix and allowed to propose to hospitals anew, starting at the tops of their lists. In this way, the algorithm attempts to forestall each possible instability, one at a time.

The algorithm is also equipped with “loop-detectors” to keep it from cycling in infinite loops. If, for example, the algorithm detects two couples that keep displacing each other from the same pair of hospitals, it forces one of the couples to propose to a pair of hospitals lower down on its preference list.

The Roth-Peranson algorithm is guaranteed to terminate, but the matching it finds is by no means guaranteed to be stable, even if a stable matching does exist in the given market. And when an algorithm produces an unstable matching, it’s not simply a matter for hand-wringing in the ivory tower. In two papers in the early 1990s, Roth studied a collection of different algorithms used by regions of the British National Health Service in the 1960s. He found that for the most part, the algorithms that produced stable matchings had succeeded and survived, while the algorithms that produced unstable matchings had gradually lost participants and support, eventually falling by the wayside.

## Surprising success

Yet so far, this has not been the fate of the Roth-Peranson algorithm. Somehow, it almost always seems to find a stable matching.

To understand what underlies this success, Kojima, Pathak, and Roth examined data from the first nine years that the clinical psychologists market used the Roth-Peranson algorithm. In each year, the algorithm had successfully found a stable matching.

The market for clinical psychologists had several distinctive features, the researchers found. For one thing, couples were just a small fraction of participants, averaging about 1%. Also, as discussed above, participants typically ranked only a small number of the open positions available to them. The researchers also found that each year, about 17% of programs had unfilled positions after the match, necessitating an aftermarket like the NRMP’s Scramble.

The researchers suspected that the large size of medical markets—usually numbering many thousands of participants—might be behind the Roth-Peranson algorithm’s success, since large markets tend to work better than small ones in many respects. To test this theory, they decided to see what happens as the size of the market approaches infinity.

The researchers considered a sequence *M _{n}* of random markets with

*n*hospitals, and made the following three assumptions (as well as some other regularity conditions):

- The number of couples is small compared to the size of the market. (More precisely, the number of couples grows more slowly than a constant times the square root of
*n*.)

- The number of entries on the preference list of any student is bounded by a constant that does not depend on
*n*.

- The popularity of different hospitals on students’ preference lists doesn’t vary beyond a fixed ratio. In other words, not everyone is putting, say, the same few top-tier hospitals on their lists. (It might seem at first glance as if everyone would want to put the top hospitals on their list; but the students who didn’t even get interviews at those hospitals would know that they don’t stand a chance there, and focus on more attainable positions.)

Vicious circle

To see one way that a market with couples could fail to have any stable matching, consider a market with just two hospitals, Harvard Medical Center and Stanford Medical Center, and three students, Adam, Eve, and Bob. Suppose the preference lists are as follows:Bob: Harvard, Stanford Adam and Eve: Adam at Stanford, Eve at Harvard (they aren’t interested in any other placement)Harvard: Eve, Bob (not Adam) Stanford: Bob, Adam (not Eve)If the match places Adam at Stanford and Eve at Harvard, then Bob and Stanford will choose to “elope,” since Stanford prefers Bob and Bob is available and willing to go to Stanford. So the match isn’t stable.If the match leaves Adam and Eve unmatched (the only other allowable option), then there are two possibilities:If Bob gets Harvard, then Harvard and Eve, and Stanford and Adam, will choose to elope, since Harvard prefers Eve to Bob and Stanford prefers Adam to no one.If Bob gets Stanford (or nothing), then Bob and Harvard will choose to elope.Thus, there is no stable matching.

The researchers found that as the size of the market approaches infinity, the probability that a stable matching exists tends to 1. What’s more, with probability also tending to 1, the sequential couples algorithm will find a stable matching, without even having to call upon the Roth-Peranson algorithm’s repair mechanisms.

The reason for this success lies in the fact—which the researchers prove—that in these large markets, at every stage of the matching algorithm’s progress, a substantial number of programs have unfilled positions. This gives the algorithm a crucial degree of wiggle room for placing couples.

Recall that the sequential couples algorithm fails only if, after it places a couple, another participant wants to apply to one of the couples’ two positions. In the researchers’ model, they show that this is unlikely to happen: Participants are much more likely to apply to one of the many unfilled positions than one of the relatively few positions occupied by couples. And as long as no one ever displaces a couple, the algorithm is guaranteed to find a stable matching.

## Honesty is the best policy

The three researchers also show that as the size of the market approaches infinity, it becomes more and more difficult for any participants to successfully game the system by lying about their preferences.

In a 1982 paper, Roth proved that in a market with no couples, the side doing the proposing has nothing to gain from lying. But for individuals on the receiving end of the proposals, truth-telling may not be the best strategy. Roth later proved that in a market with couples, both sides may potentially benefit from lying.

To see how it might sometimes benefit market participants to lie, consider a market with two hospitals, Harvard Medical Center and Stanford Medical Center, and two students, Alice and Barbara. Suppose the algorithm is set up with the students doing the proposing, and the preference lists are as follows:

If everyone tells the truth, then Alice will propose to Harvard and Barbara to Stanford; both proposals will be accepted, and the algorithm will end, with a stable match that is great for Alice and Barbara, but not so hot from the hospitals’ point of view.

Harvard, for one, could do better for itself by lying and saying that it will accept only Barbara. In that case, the first round of the algorithm will end with Barbara temporarily accepted by Stanford, and Alice rejected by Harvard. A second round will occur, in which Alice proposes to her second choice, Stanford, and is accepted, displacing Barbara. Then Barbara proposes to Harvard in round three and is accepted, resulting in a stable match more to Harvard’s liking than if it had been truthful.

In this example, Harvard’s lie sets off a chain reaction of rejections and proposals that eventually results in Harvard receiving a proposal from a preferred student. But in a large market, the researchers show, it’s unlikely that such a chain reaction would end back at Harvard, instead of landing on one of the many empty positions.

“In a large market, the opportunities to manipulate become increasingly rare,” Roth says.

## Design for matching

The new work on couple matchings is the kind of thing market design is all about, says Paul Milgrom, an economist at Stanford University. “You have a genuine issue, like couples participating in a market, and the market design has to solve it somehow, even if it doesn’t neatly institute the classical theory of Gale and Shapley

While the group’s result—that large markets with couples will usually have a stable matching—is a good start, ultimately economists will want to know just how large is large enough, Milgrom and Roth agree.

“Eventually we would like much better theorems, that say what the probability of finding a stable matching is for a market of a certain size, with a certain number of couples,” Roth says. “I think it’s definitely doable.”

Market design is an appealing area of economics, Roth says, because mathematical theory informs the design of real markets, and the functioning of those markets in turn motivates the development of new mathematics.

“There’s a nice back and forth between mathematics and the real world,” Roth says. “The theory is turning out to be a powerful guide to what’s going on in these markets, which is very gratifying.”