All of Us, an ambitious health and genetics study aiming to enroll 1 million volunteers who represent the United States’ diversity, has reached a major milestone: the first release of nearly 100,000 whole genomes. The DNA sequences are tied to anonymized health records from the participants, allowing the study of how gene variants influence health.
The data made available today will be a bonanza for exploring the interplay among DNA, the environment and diseases, particularly in people who identify as Black and Hispanic, who are missing from most genomic studies, researchers say. “This data access provides a huge leap for genetics research,” says Cristen Willer of the University of Michigan, who studies the genetics of cardiovascular and metabolic diseases. She’s disappointed, however, that at least for now, only scientists at U.S. institutions can use the data.
All of Us is modeled after similar studies in other countries, such as the UK Biobank, which holds genome and health data on 500,000 people of mostly European ancestry. The UK project has tied hundreds of DNA markers that vary among people to traits and illnesses, from arthritis to heart disease. Started in 2018, the All of Us program run by the National Institutes of Health (NIH) has enrolled about 330,000 participants so far.
The data released today include whole genome sequencing data for more than 98,600 people, much of it linked to electronic health records, measurements from brief clinical exams, and survey responses. (All personal identifiers have been removed.) Half the participants are from racial or ethnic groups underrepresented in research, including people identifying as Black or African American (22%), Hispanic or Latino (17%), and Asian (3%).
All of Us is also releasing data from cruder DNA marker scans of an overlapping set of 165,000 participants that can reveal common genetic variants and their links to disease. But the whole genomes will allow researchers to look for rare variants that sharply raise a person’s disease risk and help reveal the underlying biology of the condition. Those rare variants are poorly understood in non-European populations, says geneticist Josh Denny, CEO of All of Us. For example, a genome study in Uganda found variants related to blood traits and glucose levels that had not been seen in people of European ancestry.
U.S. researchers who have been approved to use the data and gone through a brief online ethics training will be able to work with the information via a cloud-based platform. Already some 1500 researchers at 300 institutions are signed up, Denny says. But unlike with the UK Biobank and other NIH genomics datasets, non-US researchers aren’t eligible.
That policy “is unfortunate and will slow research and global equity,” says Willer. An All of Us spokesperson says the program is “eager” to broaden access but is “still working through policies to support secure data sharing” with scientists abroad.
Two other US biobanks boast similar large numbers and racial diversity but have limitations, Denny notes. For example, an NIH program called TOPMed includes data from dozens of studies that can’t always be merged because they collected health data in different ways. And to use data from the Million Veteran Program, outside scientists have had to collaborate with researchers at the U.S. Department of Veterans Affairs.
“We each have our different strengths,” says Denny. All of Us is also unusual because participants can choose to see their genetic data.
The All of Us project, which has cost more than $2 billion so far, had to halt enrollment early in the COVID-19 pandemic but is again steadily recruiting. Denny hopes to reach 1 million participants by the end of 2026.