Neuroscience, particularly neurophysiology, lags behind fields such as astronomy and genomics in one key area — data sharing and standardization. Astronomy data is typically gathered in centralized facilities, leaving scientists to focus on data analysis. Neurophysiology data, in contrast, is generally gathered in individual labs using custom-built systems. There it remains marooned, rendered inaccessible by idiosyncratic formatting and lack of tools for sharing. It’s difficult to compare results from lab to lab, and perhaps more importantly, most of the wealth of information existing in these datasets remains untapped.
“The papers we publish are pale reflections of the complex, rich data we produce in the lab,” says Karel Svoboda, a neuroscientist at Janelia Research in Ashburn, Virginia, and an investigator with the Simons Collaboration on the Global Brain.
A growing effort called Neurodata Without Borders (NWB) aims to change that. In 2014, inspired by the launch of several large-scale neuroscience projects — Europe’s Human Brain Project, the Allen Institute’s MindScope, and the Obama BRAIN (Brain Research through Advancing Innovative Neurotechnologies) Initiative — Svoboda, SCGB investigators Markus Meister and Loren Frank, and a handful of other neuroscientists hatched a plan to break down some of the obstacles for data sharing. “We decided it was critical to develop data standards,” Svoboda says.
In a year-long pilot project, the group worked with programmers from the Redwood Center at the University of California, Berkeley and the Allen Institute in Seattle to develop a novel data-sharing platform. NWB 1.0 included a data model, which provides a structure for how different types of data should be organized, and an application programming interface, or API, so that software tools can read and write the data based on the data model.
Over the last year, the NWB team has been working with programmers from Lawrence Berkeley National Laboratory to expand the platform, making it more flexible and modular. They aim to have the new version ready by the Society for Neuroscience meeting in November. “We think it’s even more urgent now than it was two to three years ago,” Svoboda says. “It’s become clear in the last few years that we need to build on that effort and make it useful for a much larger cohort of people.”
The number of large-scale neuroscience projects is growing, including the SCGB-funded International Brain Lab, a large collaborative project involving 20 investigators from the United States and Europe. Because data is shared among many groups, these types of projects require a standardized way to store information. “It’s a problem that lots of people recognize and have written about, but now we need to do something about it,” Meister says.
“The primary output from these projects will not be research papers but data, collected under standardized conditions and distributed in a standardized format in public repositories,” Svoboda says.
A Big Payoff
The payoff for a broadly accepted data standard in neurophysiology could be enormous. For example, a graduate student working on a project that requires data from multiple sources might spend months writing specialized code to extract that data. If the data were stored in a standardized format, she would be able to access it in a matter of days.
Frank, who is at the University of California, San Francisco, faced the challenge of data sharing in his SCGB collaboration with Uri Eden, who is at Boston University. “We needed to share data, but it wasn’t easy to transfer the tools we built over to them and explain what they do,” Frank says. His new SCGB project, a collaboration of seven investigators that explores remapping in the brain, will use NWB to streamline that process.
Even individual labs could benefit from a standard data format. Currently, individuals in the same lab develop their own methods for collecting and storing their data, which can make it difficult for the principal investigator and other lab members to continue to work with, particularly after that person has left the lab. “The most important person I want to share data with is myself — I want to share data across time,” Meister says.
Beyond simply sharing data, researchers hope that a standardized data format will encourage the development of broadly applicable analysis tools. “The more people that use the same data standard, the more people will write software that feeds on that standard,” Meister says.
Meister hopes that NWB will follow in the footsteps of open-source programming languages, such as Python, which have developed a community of users that continually improve it. “My vision is that eventually NWB will become a similar community-supported resource where a lot of people devote energy to making it better,” Meister says.
SCGB already encourages data sharing among its investigators. SCGB director David Tank hopes that NWB will make that easier. “We urge people to explore using Neurodata Without Borders in their labs,” Tank says. “The more people who use the platform, the more likely it will develop into a useful standard that can expand and accommodate diverse types of experiments.”
One of the reasons neurophysiology data is challenging to standardize is its diversity. Scientists record the precise timing of neural activity, as well as where the activity is recorded, the genotype of the animal, multiple behavioral variables and other factors. Different experiments require different variables, or metadata, all of which have to be precisely mapped to each other. “It’s hard to convey to people who haven’t done systems neuroscience how complex the data is,” Svoboda says.
One of the goals for NWB 2.0 is to make it easier to extend the platform to different types of experiments. The original version was focused on rodents. But the new version can accommodate other species, such as fruit flies and primates, which have different metadata. In fruit flies, for example, scientists define the recording site by cell types rather than by coordinate location.
Programmers will refine the NWB 2.0 platform over the next few months, with the goal of launching both MATLAB and Python versions in November. The pilot project, NWB 1.0, was funded by General Electric, the Allen Institute for Brain Science, the Howard Hughes Medical Institute, the Kavli Foundation and the International Neuroinformatics Coordinating Facility. The expansion effort, NWB 2.0, is funded by the Simons Foundation and the Kavli Foundation. Existing tools are freely available on GitHub.
The next hurdle will be convincing investigators to use the platform — organizing data this way requires effort on the part of investigators. “The real challenge is adoption,” Svoboda says. The group plans to do outreach to help different communities create their own extensions. For example, “a marmoset working group could develop a format that is critical for marmoset neurophysiology,” Svoboda says. They will also develop tutorials and other resources to guide novice users.
Much to his delight, Svoboda says other scientists have already used data from his lab to publish new papers. “One dataset we released became a Ph.D. thesis at Champalimaud,” he says. “Another dataset turned into a publication from University of Maryland.”
Svoboda predicts that eventually, neuroscience will follow the path of astrophysics, centralizing data collection so that individual labs can work on analysis. “I think it has to be this way. Experiments are becoming increasingly complex, and data are becoming increasingly complex,” he says. “The process of extracting meaning from data is becoming a full-time pursuit.”