New technique for more accurate genome reconstruction
US and Australian researchers have developed a new technique that will aid in a more accurate reconstruction of human genomes by determining the sections of the genome that come from each parent.
Published in the journal Nature Biotechnology, the technique will also allow researchers to identify further complexity within any type of genome — from plants to animals — and provide more precise reference genomes in researcher databases than are currently available.
Genome assembly computationally reconstructs a genome — the complete set of genes or genetic material present in a cell or organism — from the much smaller pieces of DNA that sequencing machines are able to read, much like putting together pieces of a jigsaw puzzle.
“In the current way of doing things, though, we’re missing something,” said Dr Adam Phillippy, Head of the Genome Informatics Section at the US National Human Genome Research Institute (NHGRI) and co-senior author on the study. “This is because we actually have two genomes in each one of our cells, one from our mum and one from our dad, which are known as haplotypes.”
Human genomes have relatively few differences between them. This makes it difficult to tell the two parental haplotypes apart, so they are often mixed together into a single assembly. On the other hand, some animal genomes have the opposite problem and contain many differences. To avoid this, scientists assembling animal reference genomes have used inbred animals because their genomes are less diverse. Neither of these solutions to assembling genomes is ideal, because they miss the natural variation that exists in most genomes.
“Each individual possessed two copies of each chromosome. Previous techniques have yielded genome sequences, even the human genome sequence, that were a hybrid of each chromosome pair mixed together and do not accurately capture the actual sequence of a genome,” said Professor Stefan Hiendleder, based at the University of Adelaide and a co-author of the paper.
The researchers’ goal was to design and test a better way to reconstruct the haplotypes and, by so doing, give a more accurate assembly of genomes overall. The ‘trio binning’ method, developed by NHGRI researchers Dr Sergey Koren and Dr Arang Rhie, was tested on a cross between two cattle breeds that not only looked very different but were also genetically distinct.
“This new technique … gives, for the first time, a true genome sequence of each chromosome in an individual as well as the highest quality genomes of the two cattle subspecies available to date,” said Professor Hiendleder.
Brahman and Angus cattle subspecies were domesticated separately thousands of years ago and have been subjected to very different selection pressures since then: pest and drought environments in the case of the Brahman cattle and beef production in Angus cattle. This has resulted in distinct differences between the breeds that are reflected in their genomes — for example, the Angus breed evolved to produce a very high-quality beef product, while the Brahman breed, emerging from India, evolved to be tick and drought resistant, along with having a characteristic hump. This makes them ideal test subjects.
“In the old way of doing genome assembly, you wanted to use inbred animals,” said Dr Tim Smith, a research chemist at the US Department of Agriculture (USDA) and co-senior author on the study. “Trio binning has completely turned that on its head, and for this method, it’s better to use a cross of the most different genomes that you can find.”
Trio binning takes advantage of the newest generation of sequencing technology that can ‘read’ much longer regions of the genome — as many as 20,000 bases at a time or more — compared to a few hundred bases in previous technology. The parents’ genomes are first sequenced using high-accuracy short reads to determine which parts of their genomes are unique to each parent. The offspring’s genome is then sequenced using much longer reads. These reads are then sorted using shorter marker sequences based on which parent they were inherited from.
“For these cattle, about 92% of the markers sequences are shared by both parents,” said Dr Phillippy. “The remaining percent are unique to each parent, so anytime you see one of those markers you know which parent it’s coming from. Knowing this, you can sort the offspring’s reads by which parent they are from and then assemble both parental haplotypes separately.”
Dr Smith said trio binning will be useful for further studies in cattle, noting, “Cattle are leading the way in terms of using genomes to better understand which agricultural traits, like higher milk production, are better passed down to new generations. It’s pretty transformational work.”
The technique can also contribute to the goal of using a person’s unique DNA sequence in their clinical care, otherwise known as precision medicine. Typically, if a clinician is treating a patient with a suspected genetic disease, the clinician will order DNA sequencing for their patient to identify where in the genome a disease-causing variant may lie. However, current methods might miss the causative variant altogether if it exists on only one of the patient’s haplotypes.
“If you’re looking for a disease variant and the patient’s genome has had their haplotypes blurred together, you might miss it,” said Dr Phillippy. “This new method will also help us build a more inclusive representation of human genome variation. By assembling more of these high-quality human haplotypes, we’ll get a much clearer picture of what’s missing in the reference databases. This will improve the accuracy of genetic tests.”
Two separate studies have revealed the role played by the gut microbiome in the spread and...
Ahead of her appearance at ASM 2019, we spoke to Dr Marnie L Peterson about how she went from...
Pure stem cells provide a more effective treatment for osteoarthritis than traditional therapies,...