August 14, 2022

Two decades after the blueprint for the human genome sequence was unveiled amidst roaring, a team of 99 scientists have finally deciphered the whole thing. They have filled in large gaps and corrected a long list of errors in previous versions, giving us a fresh look at our DNA.

The consortium has put six articles online in the past few weeks describing the full genome. This hard-searched data, now under review by scientific journals, will give scientists a deeper understanding of how DNA affects disease risk, the scientists say, and how cells keep it in neatly organized chromosomes rather than molecular clusters.

For example, researchers have discovered more than 100 new genes that may be functional and identified millions of genetic variations between people. Some of these differences are likely to play a role in disease.

For Nicolas Altemose, a postdoctoral fellow at the University of California, Berkeley who worked on the team, looking at the entire human genome feels like taking close-ups of Pluto from the New Horizons spacecraft.

“You could see every crater, you could see every color, of something that we had only the slightest understanding of before,” he said. “It was an absolute dream come true.”

Experts not involved in the project said it will allow scientists to study the human genome in much more detail. Large parts of the genome that were simply empty are now deciphered so clearly that scientists can study them in earnest.

“The results of these sequencing efforts are amazing,” said Yukiko Yamashita, developmental biologist at the Whitehead Institute for Biomedical Research at the Massachusetts Institute of Technology.

While scientists have known for decades that genes are spread across 23 pairs of chromosomes, these strange, worm-like microscopic structures have largely remained a mystery.

By the late 1970s, scientists had the ability to locate a few individual human genes and decipher their sequence. But their tools were so crude that hunting down a single gene could take a career.

Towards the end of the 20th century, an international network of geneticists decided to try to sequence all of the DNA in our chromosomes. The Human Genome Project was a bold undertaking considering how much there was to be sequenced. Scientists knew that the twin strands of DNA in our cells contain roughly three billion pairs of letters – text long enough to fill hundreds of books.

When this team began their work, the best technology the scientists could use was sequenced pieces of DNA that were only a few dozen letters or bases in length. The researchers had to put them together like pieces of a giant puzzle. To piece the puzzle together, they looked for fragments with identical ends, meaning they came from overlapping parts of the genome. It took years for them to gradually assemble the sequenced fragments into larger swaths.

The White House announced in 2000 that scientists had completed the first draft of the human genome, and details of the project were released the following year. But much of the genome remained unknown as scientists struggled to figure out where millions of other bases hung.

It turned out that the genome was a very difficult puzzle that could be put together from small pieces. Many of our genes exist as multiple copies that are almost identical. Sometimes the different copies perform different tasks. Other copies – known as pseudogenes – are deactivated by mutations. A short fragment of DNA from one gene could just as easily fit in with the others.

And genes only make up a small part of the genome. The rest can be even more confusing. Much of the genome is made up of virus-like pieces of DNA, most of which only exist to make new copies of themselves that are reinserted into the genome.

In the early 2000s, scientists got a little better at putting the genome puzzle together from its tiny pieces. They made more fragments, read them more closely, and developed new computer programs to put them together into larger pieces of the genome.

From time to time, researchers revealed the latest, greatest design of the human genome – known as the reference genome. Scientists used the reference genome as a guide for their own sequencing efforts. For example, clinical geneticists would catalog disease-causing mutations by comparing genes from patients with the reference genome.

The latest reference genome came out in 2013. It was much better than the first draft, but far from finished. Eight percent of it was just empty.

“Basically, an entire human chromosome has been lost,” says Michael Schatz, a computer biologist at Johns Hopkins University.

In 2019, two scientists – Adam Phillippy, a computational biologist at the National Human Genome Research Institute, and Karen Miga, a geneticist at the University of California, Santa Cruz – formed the Telomere-to-Telomere Consortium to help complete the genome.

Dr. Phillippy admitted that part of his motivation for such a bold project was that he was annoyed by the missing gaps. “You really pissed me off,” he said. “You take a beautiful landscape puzzle, pull out a hundred pieces and look at it – that’s very annoying for a perfectionist.”

Dr. Phillippy and Dr. Miga called on scientists to join them in solving the puzzle. In the end, 99 scientists worked directly on sequencing the human genome and dozens more who interfered to understand the data. The researchers worked remotely through the pandemic and coordinated their efforts through Slack, a messaging app.

“It was a surprisingly beautiful colony of ants,” said Dr. Miga.

The consortium used new machines that can read sections of DNA tens of thousands of bases in length. The researchers also invented techniques to find out where particularly mysterious repetitive sequences belonged in a genome.

In total, the scientists added or fixed more than 200 million base pairs in the reference genome. You can now say with confidence that the human genome is 3.05 billion base pairs long.

Within these new DNA sequences, the scientists discovered more than 2,000 new genes. Most appear to be disabled by mutations, but 115 of them look like they could make proteins – the function of which scientists may need years to figure out. The consortium now estimates that the human genome contains 19,969 protein-coding genes.

Finally, after a full genome was assembled, researchers were able to better study the human-to-human variation in DNA. They discovered more than two million new places in the genome where people differ. Using the new genome also helped them avoid disease-related mutations that actually don’t exist.

“This is a big step forward in this area,” said Dr. Midhat Farooqui, director of molecular oncology at Children’s Mercy, a hospital in Kansas City, Missouri, who was not involved in the project.

Dr. Farooqi has started using the genome for his rare childhood disease research by matching his patients’ DNA with the newly filled gaps to look for mutations.

However, moving to the new genome can be challenging for many clinical laboratories. They need to move all of their information about the links between genes and diseases to a new map of the genome. “There will be a lot of effort, but it will take a few years,” said Dr. Sharon Plon, medical geneticist at Baylor College of Medicine in Houston.

Dr. Altemose plans to use the entire genome to explore a particularly mysterious region in each chromosome known as the centromere. Instead of storing genes, centromeres anchor proteins that move chromosomes through a cell as cells divide. The centromere region contains thousands of repeated segments of DNA.

At their first glance, Dr. Altemose and his colleagues on how different the centromere regions can be from person to person. This observation suggests that centromeres developed rapidly as mutations insert new pieces of repeating DNA into the regions or cut out other pieces.

While some of this repeating DNA might play a role in pulling the chromosomes apart, the researchers have also found new segments – some millions of bases long – that do not appear to be involved. “We don’t know what they’re doing,” said Dr. Altemose.

But now that the empty zones of the genome are filled in, Dr. Altemose and his colleagues examine them up close. “I’m really looking forward to seeing all the things we can discover,” he said.