Financial institutions, governments and internet companies all take great pains to encrypt sensitive information. But how can we know whether a cryptographic scheme is really secure? How can we know no one can hack into these systems?
According to Gilad Asharov, an emeritus Simons Society junior fellow who recently joined J.P. Morgan AI Research, commonly used encryption methods are not perfectly secure, but they still rest on solid foundations. “We base our encryption schemes on long-standing assumptions about what modern computers can and cannot accomplish,” he says. “Maybe tomorrow someone will break these assumptions, but we have good reasons to believe this is very unlikely.”
This built-in uncertainty fascinates Asharov, whose studies focus on the conditions under which such cryptographic assumptions hold up or unexpectedly break down in classical computers (and less so in quantum ones, where the rules can sometimes be different). In an age where data is being collected and exchanged in increasingly complex ways, and its security seems constantly besieged by large-scale hacks and breaches, defining the theoretical limits of encryption matters more than ever, Asharov says.
The following is an edited and condensed version of my interview with Asharov, conducted earlier this year while he was a postdoctoral researcher at Cornell Tech.
No encryption scheme is perfect, but it has to be hard enough to crack that people will trust it. What’s an example of this kind of “hardness”?
Theoretical computer science deals with what problems a computer can solve efficiently. What we do in cryptography is to prove mathematically that certain cryptographic schemes cannot be broken, under those assumptions of what a computer can efficiently accomplish.
For example, let’s say I pick two very large prime numbers, multiply them together, and give you the answer without telling you what factors I multiplied together. We don’t know how to find those factors efficiently; there is no algorithm that knows how to do it fast enough. So if we can prove that an encryption scheme is mathematically equivalent to that factoring problem, it means there’s no algorithm for cracking the encryption either. We pick the two prime numbers to be large enough such that factoring the multiplication will take hundreds of years using the best algorithm we are aware of, and this will be the amount of time you will have to spend to break the encryption. Which means it’s very secure.
But not perfectly secure?
Right. We’ve been thinking about this prime factorization problem for 2,000 years, and we still don’t have a fast algorithm for it that runs on classical computers. But no one has proven that it doesn’t exist. So it is an assumption that any encryption scheme that is equivalent to this problem will be too hard for computers to crack. We have many other “hard” problems, but this is one example.
What would happen if someone did find a way to break an encryption scheme that was equivalent to one of these “hard” problems for classical computers?
We say, “Science wins either way.” Once you have an attacker that breaks a certain encryption scheme, I can use this attacker’s method to break, or to solve, the equivalent computational problem that currently we do not know how to solve. So if tomorrow someone comes with an algorithm to break an encryption that is equivalent to the prime factorization problem, then — wow! — we finally have a solution to this “hard” problem.
It seems like we hear about a giant data breach every other week. If cryptography is so well studied, why does this happen?
With many of these breaches, it’s not a matter of cryptography being broken; the data was not appropriately protected. Sometimes they don’t even try to protect the data. We saw that with the Facebook–Cambridge Analytica case. No cryptography was broken here; it was a bad design for what information was made accessible to an external party. Now they understand much better that they have to protect this data, so they put in much more effort.
In many cases, security breaches occur because cryptographic schemes were not implemented correctly, in terms of the engineering itself. Many attacks just use bugs in the software.
Is encryption and privacy a binary distinction: Either you’re fully protected, or you’re totally exposed? Or is it more complicated?
In general, encryption is all-or-nothing — either you get full access to the underlying message by having the secret key, or you do not learn anything about the underlying message if you do not have the secret key. But in practice, in many cases it is not clear what data is supposed to be protected and what is not, and then things become more complicated. Facebook wanted to give away access to some of the data, not all of it. But how do you measure what is a good amount of access? I might think I gave you some limited access, but in the end this limited access could give you so much more information than what I thought.
Another example is Netflix. About 10 years ago, the company set up a challenge where you could send in an algorithm to recommend movies to Netflix users, and if it was better than theirs, the winner would get a million dollars. To facilitate this contest, Netflix published a database with a lot of user information and just removed the usernames and some other identifying information from the database. And the company thought this was fine.
But when you take that database and intersect its information with other auxiliary information in other databases out there, you start to identify the users. Just deleting the columns of usernames was not enough to block access of that sensitive information. So this was a more complicated problem than Netflix had thought. In many realistic cases, it’s not that I have something that I want to encrypt so no one else can read it; I do want to give some information out. But I want to ensure that this is the only information you can actually access, and nothing else.
You study a solution to this problem called “secure computation.” Can you explain the concept?
Secure computation is when you and I want to compute something together based on inputs we want to keep private from each other. I don’t trust you and you don’t trust me, but we still want to cooperate and learn something.
The classical example is Yao’s Millionaires’ problem. You and I are both wealthy, and we want to find out which of us has more money in the bank. But I don’t want to reveal how much money I have, and neither do you; it’s private information. So how can we compute this together even though we don’t necessarily trust each other, while maintaining our privacy?
A more real-world example of this would be two hospitals which each have sensitive information about their patients, and you want to do some joint research on the patients, but you cannot reveal the information due to privacy regulations. Secure computation enables this cooperation between distinct parties, even though the data itself is not revealed.
But how can any computation happen on encrypted data?
A very simple example is computing the average of salaries of three parties who don’t wish to reveal their individual salaries to each other. The first party can encrypt its salary by adding a very large random number to it. Then it can send it privately to the second party, which will add its own salary value and send it privately to the third party, which will add its salary and then send it back to the first party. Each of the salary amounts is now encrypted, but in a way that allows the first party to do some computation on the underlying message. That’s because the first party knows the random number it added, so it can just subtract the number, divide the result by the number of parties and broadcast the computed average back to the others.
The science of cryptography is in understanding how to design these kinds of protocols for more complicated scenarios. How many parties can be involved, how many rounds of information exchanging do you need, how can privacy be maintained even if one or more of the parties misbehaves and does not follow the protocol? This is our goal: to discover and construct protocols that can answer these questions.
As you can see, cryptography is much more than just encryption schemes. Today we want to do computations on encrypted data. We want to allow you to outsource your information to a storage provider and search on your data while it is encrypted. We want to allow you to obfuscate a software and allow others to use it while making it robust to reverse engineering.
What’s one big problem in cryptography you’d like to see solved?
To actually understand data leakage. To have some theory of what you can reveal and what you cannot reveal. There is no theory behind ; it’s just case-by-case analysis. I don’t know if it’s even possible, but this is something that we have to understand more in our community.