The Dimension Question: How High Does It Go?
Scientists can routinely track the activity of hundreds — sometimes even thousands — of neurons in the brain of awake animals. But how many cells do they need to monitor to truly understand how the brain functions? That question has become a hot topic among researchers doing large-scale neural recordings. The answer could shape the future of the field, influencing how scientists design experiments and new technologies.
“The question is whether we can infer something about what the larger network is doing from sampling a subset of its neurons,” says Byron Yu, a neuroscientist at Carnegie Mellon University in Pittsburgh and an investigator with the Simons Collaboration on the Global Brain (SCGB). “What can we learn from populations of neurons that would be different from what we learn from one neuron at a time?”
“Now that we can record from 100 or 1,000 neurons at the same time, is this giving us enough of a view of what the brain is doing?” asks John Cunningham, a statistician and computational neuroscientist at Columbia University and an SCGB investigator. “Can we look at 100 neurons and say, ‘I think this is a good summary of brain activity’?”
Research presented and published over the past year suggests that under certain circumstances, the answer is yes — models and experimental data show that activity from 100 cells provides roughly the same information as from larger numbers of cells, at least for relatively simple tasks. But new experiments tracking thousands of neurons during complex tasks hint that the brain may be capable of much more complex responses. Understanding those activity patterns will require recording from more cells or for longer time periods.
As tools for recording from large numbers of neurons have improved over the past 10 to 15 years, neuroscientists have had to develop new methods for analyzing the data these experiments produce. One general approach is known as dimensionality reduction, which seeks to describe the most salient features of a dataset using as few variables as possible. This technique is based on the premise that most neurons are interconnected and therefore not acting independently. “If two neurons increase and decrease activity together, you only need one variable to describe them,” Yu says.
Take a hypothetical experiment in which scientists record from 100 neurons at the same time. If each neuron fired independently of the others, that dataset would have 100 dimensions. But many cells have coordinated activity. Researchers use computational techniques, such as principal component analysis or factor analysis, to reduce the dimensionality of the data to a few essential components. The activity of those 100 neurons might be accurately represented by just 20 dimensions.
If neural activity from that hypothetical experiment can indeed be accurately represented by 20 dimensions, then recording from 80 or 100 or 200 neurons will give a good picture of what the brain is doing. “There is early evidence that is the case,” Cunningham says. “But there is a big concern: What if that’s only happening because you’re looking at a small set of neurons?” Would an experiment tracking 10,000 neurons still be accurately captured with 20 dimensions? “Maybe we are fooling ourselves into thinking that this 20 dimensions of stuff is actually what’s going on,” Cunningham says.
In a paper published in PLOS Computational Biology in December 2016, Yu and his collaborators attempted to address that question by examining how the number of neurons and the number of trials in an experiment influence the number of dimensions needed to explain the data and how prominent those dimensions are. “Given that we have limited recording time and can’t record from all the neurons in circuit, how does our understanding of the circuit depend on these practical limitations?” Yu says. (Yu’s study used a dimensionality reduction technique called factor analysis, which calculates the dimensionality of fluctuations that are shared across all neurons. Prominence refers to how much of the variance in neural activity is shared versus how much is restricted to individual neurons. If a large percentage of variance is shared, the dimensions that explain it are highly prominent.)
The researchers analyzed data recorded from 80 neurons in the visual cortex of monkeys over 1,200 trials. They found that results calculated from the whole dataset and results using just a subset of the data identified similar dimensions. The researchers then extrapolated these results using simulated data from spiking network models, a type of neural network used to model biological systems. By scaling up the size of the model network, “we can ‘record’ from as many neurons as we’d like and for as long as we’d like,” Yu says.
The researchers compared real data with simulated data from two types of networks — in one, neurons are randomly connected, and are weakly correlated with one another. In the other, neurons are clustered into groups, with each cluster connected to others, a model developed by SCGB investigator Brent Doiron, a theoretician at the University of Pittsburgh, and Ashok Litwin-Kumar, a postdoctoral researcher at Columbia University. They found that the clustered model more closely resembles the real data, and they used that model to extrapolate to larger numbers of neurons and trials.
The results suggest that the 80 neurons and 1,200 trials in the real dataset are sufficient to identify the most prominent dimensions in spontaneous population activity. “Even if we only have 80 neurons, we can still learn what’s going on,” says Cunningham, who was not involved in the study.
Yu cautions that because researchers haven’t actually recorded from 10,000 neurons, they can’t definitively prove that dimensionality will peter out with a certain number of cells. In fact, Yu, SCGB investigator Adam Kohn, and their collaborators published a second paper in the same issue of PLOS Computational Biology showing that the specific results can depend on the complexity of the task. “If the task is more complex, then we would expect neural activity to be more complex, and we may need more neurons and more time to be able to fully see what this population of neurons is doing,” Yu says.
Surya Ganguli, a theoretician at Stanford University in California and an SCGB investigator, has also been thinking deeply about this question. He has developed a formula to calculate the number of cells, number of trials, and measure of trial-to-trial variability needed to recover the dimensionality and orientation of neural activity. “Dimensionality reflects both the complexity of the task and how the brain encodes the task,” Ganguli says. “Once you know or estimate these two things, you can predict how many neurons you’ll need to record from to build accurate decoders and do accurate dimensionality reduction and to recover the state-space dynamics of the circuit.”
Ganguli and his collaborators tested the theory on simulated data, varying the number of neurons and number of trials they used. They also applied the formula to real data from Krishna Shenoy’s lab at Stanford — activity recorded from 109 neurons collected over 147 trials of a monkey reaching for a single target. As was the case in Yu’s study, results calculated using subsets of the data resembled the result calculated using all data. What’s more, the accuracy of the data analysis matched that predicted by the theory.
Ganguli and Yu’s work suggest that the simple tasks — such as a reach or viewing a moving grating — used in most of today’s experiments will generate relatively low dimensional data and that more complex tasks — such as viewing natural scenes — generate more complex neural activity. “It’s one of those things that’s obvious in retrospect, but no one had said it,” says Kenneth Harris, a neuroscientist from University College London and an SCGB investigator.
“Most experiments have been using low-dimensional stimuli and therefore explored only the low-dimensional array of patterns that neurons can produce,” says Marius Pachitariu, a postdoctoral researcher in Harris’ lab who will soon start a lab at the Howard Hughes Medical Institute’s Janelia Research Campus in Ashburn, Virginia.
Ganguli initially presented his theory at the first SCGB annual meeting in 2015. After the talk, Harris began to consider what kinds of experiments it would take to figure out if cortical activity is truly low dimensional. “Ganguli made it clear that we would have to record from a lot of neurons for a long time and lots of stimuli if we want to get at this question — whether cortical activity is high dimensional,” says Pachitariu.
Inspired by that question, Harris, Pachitariu, Matteo Carandini, who is also an SCGB investigator, and their collaborators decided to run an experiment tracking many more neurons during a more complex visual task. They monitored 10,000 neurons in the mouse visual cortex using two-photon calcium imaging while the mice looked at 3,000 different natural visual scenes. That variety of visual stimuli differs from the standard approach of repeating the same stimulus over and over, which helps distinguish signal from noise in neural activity. But Pachitariu and his collaborators wanted to more closely represent the complexity of the real world, where animals have to solve lots of different problems, such as finding food and mates and avoiding predators. “That stimulus space is much higher dimensional,” Pachitariu says. Showing animals many different images “maximizes how differentially you can drive neurons, which helps to get at the complexity of the circuit.”
However, analyzing thousands of neurons responding to lots of different stimuli presents technical challenges. To accurately track so many cells, Pachitariu developed software to detect sparsely active neurons. “Not all algorithms out there can do that,” Harris says. The software, called Suite2P, is freely available. Because each image was shown only twice, Pachitariu had to develop new ways distinguish signal from noise. “We cannot really say which stimuli each neuron preferred due to noise, but we can estimate the average signal-and-noise variance in each neuron from just two repeats,” Pachitariu says. “And then we can approximate the population firing with a low-dimensional reconstruction, and ask how much signal variance is kept by that dimensionality reduction.”
In research presented at Cosyne in Salt Lake City, Utah, in February 2017, the researchers estimated that roughly 1,000 dimensions can account for 95 percent of stimulus-driven variance. That result deviates from a standard model of simple and complex cell-receptive fields, suggesting that the standard model is incomplete, Pachitariu says. Simple tuning properties of the standard model, such as orientation, account for about 30 dimensions. “We need a model that computes more complicated features of the images,” he says. “Going up to 1,000 dimensions would require other, increasingly more specific features to be computed, including functions of the input we haven’t really thought of yet.”
By contrast, spontaneous activity, which occurs irrespective of outside stimuli, has much lower dimensions. Just 50 dimensions can explain 95 percent of the variance in spontaneous activity. (See sidebar ‘Spontaneous activity linked to behavioral state.’)
The findings are still preliminary, and researchers say it’s too early to predict the implications. But the study hints at the complexity that might arise as neuroscientists expand recording technologies. “They are pushing in an exciting direction,” Yu says.
Determining the appropriate parameters for understanding a circuit — be it recording time, number of neurons or task complexity — will be important for designing future experiments. “If we just start recording more and more neurons, we might not get richer datasets because the dimensionality might be constrained by the task,” Ganguli says. To get around that limitation, neuroscientists should “leverage our ability to record from many neurons to do more complicated tasks,” he says.
The finding may also help focus the development of new recording technologies. According to Ganguli, scientists in the field should think carefully about what they can gain from improving experimental resources. “We’re in kind of terra incognita, where our ability to create technology is driving its development,” Ganguli says. “We need a better theory of the design tradeoffs with different types of experimental resources — temporal resolution, spatial resolution, recording stability, and signal-to-noise ratio.” For example, it may prove more useful to increase the duration of recordings rather than the number of cells recorded.
Yu’s team is developing methods to create larger datasets from existing technologies. Currently, researchers with multiday recordings analyze data from each day individually. But Yu and his collaborators are stitching together all those data and analyzing them as one coherent block, research that Yu’s postdoc, William Bishop, presented at Cosyne in February. “That effectively allows you to increase the number of neurons and recording time,” Yu says.
Recording from thousands of neurons will also require new mathematics. “You can’t do this analysis with 10,000 neurons without having a sophisticated mathematical view on it,” Harris says. He hopes the field will develop a mathematical language to describe populations of neurons, akin to how the receptive field describes the function of individual neurons, or how synergistic and redundant coding can describe ensembles of neurons. “We’re going to come to a point where it’s harder and harder to explain without using the math,” he says.