Researchers conducting the largest study yet of genetic differences among humans are finding that rare mutations are more widespread than expected because of rapid population growth. The 1000 Genomes Project is examining human genetic variation at an unprecedented combination of breadth and depth, so far describing the genomes of 1,092 people from 14 populations worldwide and cataloguing many rare genetic variants within them.
Hundreds of researchers, including a handful of New York scientists, participate in the project consortium, which reported on its first phase in the Nov. 1 issue of the journal Nature. The results are expected to bolster research on disease, as well as on our past. The rare genetic variations so far catalogued offer a window onto both.
People from all populations included carried greater-than-expected numbers of rare variations, a phenomenon studied by two contributors from Cornell University, Andrew Clark and Alon Keinan. “The human population has grown so explosively, just out of sight from any standard theory in population genetics on genetic variation,” says Clark, who serves on the project’s steering committee and in the analysis group.
Population growth in recent millennia has allowed the introduction of new mutations to outpace the forces that would otherwise remove them. This has implications for disease. “Rare variation entering a population, it looks more like just mutations rather than typical SNPs (single-nucleotide polymorphisms),” Clark says. While natural selection has had time to work over typical SNPs, and so removing problematic alleles, “the rare variant is more likely to be deleterious.”
Researchers suspect rare variants contribute to the risk of complex disease, but making connections to specific diseases has proved to be challenging, more so than pinpointing the rare variants behind Mendelian diseases, according to Clark.
As recent arrivals in the gene pool, rare variants tend to be isolated; the less frequently the group identified a variant, the more likely it was to show up in only one of the 14 populations.
The breadth of populations sequenced for the project holds special relevance to Juan Rodriguez-Flores, a contributor and post-doctoral researcher at Weill Cornell Medical College. “I am Puerto Rican, so the fact they included over 50 Puerto Ricans in the study was for me a great thing,” he says. (The project is ultimately expected to include sequences from 90 Puerto Ricans.) The work might reveal characteristic variation that Puerto Ricans do not share with typical reference populations from, say, Europe or Asia.
Casting the widest net possible for populations from which to sample will ultimately help improve health of people all over the world and avoid an increase in global health disparity, according to Rodriguez-Flores. “If I were the head of the consortium, that would be my goal, to continue until we have sampled genetic variation truly all over the world, especially in developing countries that are under-studied,” he says.
The project discovered and genotyped about 38 million SNPs, 1.4 million indels (insertions or deletions as long as 50 bases), and 14,000 large deletions.
Seungtai Chris Yoon, a research assistant professor at Cold Spring Harbor Laboratory, identified large deletions, as well as duplications. Only the large deletions were included in the Nature paper; duplications and other structural variations will be reported in the next paper, according to Yoon.
“The large proportion of the variation was expected,” he says of the large deletions. “But the great thing is to nail down where they are, what they are and what is the frequency.”
Though the 1000 Genomes Project is not without critics—some questioned its scope and lamented the failure to include participants' phenotypic information—Clark says that so far the process has, arguably, been almost as important as the discoveries. “Practically every aspect of the analysis has been updated and accelerated,” he wrote in an email, pointing to a massive acceleration of the process leading to the accurate identification of variants, as well as the re-invention of data representation needed to control file size.