The Computational Quest to Make Communication Error-Free

Simons Junior Fellow Jarosław “Jarek” Błasiok, a theoretical computer scientist, seeks to make computers more efficient and accurate — even in the face of increasingly large datasets.

Above: Simons Junior Fellow Jarosław Błasiok.

Whenever we send a text message to a friend or colleague, we can be confident that the message will be received exactly as we sent it. But the physical tools underlying that effortless communication are actually error prone. So, service providers have built in redundancies — essentially extra bits of data — to make sure the message comes through correctly. Without this step, our texts would end up garbled and wrong.

Jarosław “Jarek” Błasiok, a Junior Fellow with the Simons Society of Fellows, works to make computer programs like these as seamless and error-free as possible. As a theoretical computer scientist, Błasiok sorts out which kinds of problems computers can solve and which ones they cannot. He also seeks to make computers more efficient, however they are used.

Błasiok will soon complete his postdoctoral fellowship in Columbia University’s Theory of Computation group under the mentorship of Alexandr Andoni. Prior to joining the group, Błasiok earned a doctorate in theoretical computer science at Harvard University and bachelor’s and master’s degrees in computer science from the University of Warsaw.

Błasiok and I recently discussed the everyday implications of his research efforts, and why it is OK for computers to make approximations just as people do. Our conversation has been edited for clarity.

 

What does theoretical computer science encompass?

Theoretical computer science is really a branch of mathematics applied to computer science. Our goal is to discern the types of problems that computers can and cannot solve. For those problems that computers can solve, we also work to improve efficiency to minimize computational bandwidth. As larger and larger quantities of data become available for both researchers and everyday computer users, the need to find more efficient algorithms to process these data is more urgent than ever.

 

I thought that with enough creativity you could program computers to do anything. It sounds like this isn’t true. What’s something a computer cannot do?

One of the most famous examples of this is known as the halting problem, which Alan Turing proposed in 1936. Let me take a few moments to explain.

We are all familiar with this frustrating experience: Every now and then we encounter faulty computer software, which instead of quickly finishing the job we want it to do, ‘freezes’ and stops responding to any commands. But in reality, the program is still running, but won’t ever accomplish the given task. We all remember having to restart our computer to get out of this freeze. Wouldn’t it be great if developers had some automatic way — perhaps another computer program — to detect these faulty behaviors in advance? Surely that would improve our everyday experience with computers.

As it turns out, though, Turing mathematically proved that it’s impossible to write such a program, one that checks whether in a particular situation another computer program freezes and runs forever erroneously or finishes processing (‘halts’) correctly. This simply cannot be done.

So, no matter how efficiently we build computers, and how ingenious the methods we come up with, we will never be able to write a program that accomplishes this specific task. This is an example of theoretical computer science, and it’s kind of the foundation of our field.

There will always be a need to understand what computers cannot do; that’s the theoretical side. On the pragmatic side, we also seek to optimize all that they can do.

 

Speaking of optimization, I noticed that you work on ‘streaming algorithms.’ How does this relate to improving efficiency?

Let’s say you run a website and want to know how many different people visit each day. To be accurate, you want to make sure you don’t count the same person twice. If you have a site that doesn’t get much traffic, it’s not too hard to just keep the list of all people who have visited the site so far and literally count new people one by one as different people visit your page. A simple program can track that.

But now let’s imagine you have one of the busiest sites in the world, with billions of daily views. In theory, you could keep a list of all website visitors. But this would require significant resources, taking up a lot of space, which in turn slows down your service. Suddenly what used to be an easy task is now a huge chore. In many cases it’s completely infeasible.

Enter the streaming algorithm, an elegant and nontaxing bit of code that reduces the resource requirements of this counting process by orders of magnitude. With this clever code, you don’t need to store a list of all the previous visitors to have a sense of how many people visit each day, because it calculates the approximate number of visitors for you.

One cost of this approach is that it is less precise than manually counting each and every daily visitor. But the upside more than outweighs that cost. You have a workable sense of how many visitors you have, and your website doesn’t crash.

 

Well, we also use estimates in everyday life. I don’t tell people, “I’ll see you in exactly eight minutes and 23 seconds.” Instead, I say, “See you in 10 minutes.” I guess my own high-level computing also relies on approximation.

That’s true, although I’d indulge you if you were that precise with me! Yes, we are fine with estimates in everyday life but for some reason think, mistakenly in my opinion, that computers must be exact. The truth is that there is a lot of approximation going on under the hood.

 

I also gather that there’s a lot of computational effort that is invisible to us, especially with communication tools like text messages.

Whenever we send a text message, or any kind of electronic communication, we rely on a combination of physical and digital infrastructure. Getting a message intact from your phone to mine, for example, is harder than it seems, because it inherently produces noise. To correct this noise, we need what are called error-correcting codes. Without these codes, messages wouldn’t get through successfully. The scientist Claude Shannon was the first to study this phenomenon rigorously in the 1940s.

Shannon proposed a solution, which was to add additional bits of information to each message, so that even if parts of the transmission were scrambled in the process, the receiver could still recover the original message. This became known as an error-correcting code. The ‘code’ as used here is not computer code that runs a software program, but rather a collection of signals and symbols used while communicating. Shannon even mathematically calculated how much longer a message had to be in any given communication channel for the message to come through intact!

This was a great innovation, but unfortunately Shannon’s solution turned out to be impractical. Recovering the initial message from the scrambled one, although mathematically possible, would require galactic computational effort, even with computers much faster than we have today.

Since Shannon’s time, computer scientists like myself have developed more efficient error-correcting codes. These add as little extra data as possible during the transmission of a message, enabling an efficient recovery of the intended message as well. Our main goal is to prevent the consumption of too much computing power. This is an ongoing challenge for our field, and I enjoy working to solve these kinds of challenges.

 

How has the Simons Foundation assisted your work?

Obviously, the support of the foundation to pursue independent research at one of the most prestigious universities in the world is extremely beneficial. But a no less important benefit of the fellowship was the weekly dinners in which we interacted with experts in every field (pre-pandemic of course). Those were extremely enjoyable and full of intellectually stimulating conversations. Being part of this entire community is absolutely invaluable, and it’s one of the most distinctive aspects of the Simons fellowship.

I’m also looking forward to getting back to the lab this fall — and being able to solve problems with my colleagues in front of the whiteboard. For me there is something irreplaceable about in-person communication, even if in principle you can do the same work online. No matter how my work ends up addressing the many computing challenges of today — and in the future — it’s extremely gratifying to be part of it.