Feature: Untying the haplomics knot
Thursday, 17 September, 2009
This feature appeared in the July/August 2009 issue of Australian Life Scientist. To subscribe to the magazine, go here.
In Greek legend, Gordias, a simple peasant, is chosen by fate to become the king of Phrygia when he enters the Phrygian capital driving an ox cart. Gordias’ heir, King Midas, later honours his father by tethering the ox cart to a post inside the palace with an intricate knot of bark fibres that cannot be undone. At least, it couldn’t until Alexander the Great cleaved the knot with a mighty stroke from his sword.
Modern biology has its own Gordian Knot: our genome. This is because the human genome – and that of any diploid organism – is made up of a tightly woven mix of maternal and paternal genes. Teasing out the individual genes given us by our mother and father is proving to be an epic challenge.
Scientists around the globe are currently attempting to unravel the knot, although progress has been laborious. However, one researcher, Dr Malcolm Simons, co-founder of Haplomic Technologies, thinks he might have a solution, somewhat akin to that of Alexander’s. And if it works, it may help us gain a better understanding of how genes are related to disease, possibly leading to new methods of diagnosis, prevention and even treatment.
At the centre of this knot is the haplotype. This is a segment of chromosome carrying multiple genes that tend to remain linked together, despite the meiotic mill that generates unique new combinations of genes in every gamete. This makes a haplotype a kind of time capsule, which can preserve the linked set of alleles of its embedded genes over timescales of thousands to millions of years.
The normal rule for eukaryotes is that it takes two haplomes to create a new, standard-issue, diploid genome. (Males of the eusocial insect family, the Hymenoptera – bees, wasps and ants – constitute a striking exception: they develop from unfertilised eggs laid by the colony queen and make do with an unpaired set of maternally inherited chromosomes.)
However, because natural selection did not think to colour haplotypes pink or blue for the convenience of geneticists, without parental reference sequences it’s difficult to determine which parent contributed which allele, or which haplotype, to the offspring. But this hasn’t stopped people from trying.
The International HapMap Project was initiated in 2002, and is attempting to develop a haplotype map of the human genome. The hope is that this will describe all the common patterns of human genetic variation, across all races and most ethnic groups. The HapMap project was envisaged as an oracular resource, free to geneticists around the world, enabling them to quickly identify gene variants affecting health, disease, and to predict how individuals may respond to drugs, pollution and other environmental insults and lifestyle factors.
However, Simons says that with currently available technology, it’s impossible, in principle, to separate out the haplotypes contributed by each parent. Thus, because of the lack of informative, overlapping sequences, the International HapMap Project is doomed to fall short of its lofty goal.
---PB---
Inspiration
In the 1970s, before the discovery of single nucleotide polymorphisms (SNPs), researchers began to note strong statistical associations between patterns of cleaved DNA fragments, called restriction fragment polymorphisms (RFLPs), and inherited genetic disorders.
These associations were typically made post hoc: affected members of families with a history of a particular genetic disorder were subsequently found to share DNA fragments of the same size, suggesting the restriction enzymes used to cleave their DNA had homed in on unique DNA sequences within, or proximate to, the locus of the mutant allele.
In 1987, Simons was analysing RFLP patterns from 107 white cell lines for the 10th International Histocompatibility and Immunogenetic Workshop in Princeton, US. His focus was on the Major Histocompatibility Complex (MHC), the most polymorphic locus in the human genome.
Most of the cell lines came from unrelated Caucasian subjects, but among them were Japanese, Chinese, Africa and Amerindian subjects. To his astonishment, some of these ethnically diverse individuals who shared MHC-associated disorders like type 1 diabetes, rheumatoid arthritis and ankylosing spondylitis, exhibited very similar RFLP patterns at the MHC locus.
It was already clear by 1987 that only five per cent or less of the human genome coded for protein. It dawned on Simons that if 95 per cent of the genome was so-called ‘junk’, most of the cleavage sites in the MHC had to lie in non-coding DNA: within the introns that divided genes into protein-coding modules, or in the sea of ‘junk DNA’ between genes.
Several months later, on December 23, 1987, Simons was strolling down a San Francisco street with three friends: Dr Roger Lebo, a genome expert from the University of California, San Francisco, Australian businessman-entrepreneur Dr Mervyn Jacobson, and one of Jacobson’s employees, David Cunningham. Simons suddenly stopped in his tracks, as if struck by a lightning bolt.
Turning to Lebo, he said: “Roger, if haplotypes are stably inherited, they can substitute for families and allow unrelated individuals to be used for disease-gene mapping.”
It was a moment of pure inspiration. Geneticists would no longer have to find large, multi-generation family pedigrees to track down the mutant genes involved in inherited disorders. All they needed to do was to find a cohort of unrelated individuals with the same disorder and screen them for shared haplotypes.
If unrelated individuals from different ethnic groups shared a haplotype containing the same disease-causing allele, it should be possible to identify as few as two unique genetic markers delineating that haplotype, and they would serve as surrogates for the entire, unique combination of alleles in it.
Simons patented his brainchild, which is now owned by Melbourne biotechnology company Genetic Technologies (GTG). GTG inherited it from Swiss-based GeneType AG, the company that Simons founded in partnership with Mervyn Jacobson in 1989.
Simons’ seminal observation was that the non-coding regions of the genome harbour hidden, non-random order. High-conserved DNA makers in what was previously dismissed as ‘junk DNA’ now underpin the widely practised gene-hunting technique called genome-wide association studies (GWAS).
Even two decades after he filed his patent it remains an area of controversy within the genetics community. Geneticists and gene testing companies that have reluctantly bought licenses from Genetic Technologies to use its ‘junk DNA’ patents for gene testing, still complain that Simons patented a technique that was already widely – and freely – used in the 1980s.
Such complaints give no credit to Simons’ invention, which was genuinely novel, non-obvious, and unprecedented: exhaustive literature searchers have failed to identify anything that could be considered prior art.
While some geneticists had previously recognised associations between some genetic disorders and specific RFLP patterns, the associations were made post hoc. Simons made the logical leap that unique DNA markers located in ‘junk DNA’ offered a way of predicting normal and disease haplotypes.
The first published paper to mention the possibility of using haplotypes for gene mapping in unrelated individuals by case-control comparison was not published until at least 1992 – three years after Simons lodged his provisional patent.
Simons claims no credit for the later, epochal discovery that ‘junk DNA’ actually codes for myriad small RNA sequences that regulate the function of the 24,000-odd human genes.
---PB---
The haplome puzzle
For the past decade, though, Simons has focused on genetics’ Gordian Knot: the haplome. Simons argues that no sequencing technology or DNA-reassembly algorithm is up to the task of distinguishing how two separate parental haplomes team up to create a new diploid individual. No matter how many times an individual’s DNA is resequenced to eliminate errors, it simply can’t be done with mixed DNA; there will always be ambiguities.
“The mixed DNA problem arises from the idea that, ultimately, a SNP can mark what we need to know about the incidence of a mutation, and the individual’s risk of developing the disease,” says Simons. “Researchers were able to do whole-genome SNPing before they could do whole-genome exoming. For any exon in any gene, you usually require multiple SNPs.
“You identified the SNPs with whatever technology was available at the time, and in the past, you were limited to just a few SNPs. Which of these SNPs tagged specific haplotypes? If the individual was homozygous at that locus – the same allele occurs on the paternal and maternal chromosomes – there was no way of telling which chromosome you were dealing with. You only knew there were two alleles if they were heterozygous, but you couldn’t determine which one was paternally or maternally inherited.
“A haplotype is the unit of recombination. We inherit haplotypes, not individual genes, and some of haplotypes are very large, extending over hundreds of kilobases. So researchers started using more and more SNPs, and they compared SNPs across races and ethnic groups. The more SNPS, the more informative the mapping needed to fine down risk identification.”
The haplotype problem arises, says Simons, when researchers have identified distinctive combinations of SNPs that they believe to represent distinct combinations of alleles in a haplotype.
“Unless these SNP patterns overlap, you’re going to have gaps between them, and it doesn’t matter whether you’re using SNPs or copy number variants, you can’t tell which chromosome they came from.
“If you don’t have informative intervening SNPS overlapping with the two haplotypes, you can’t determine phase: you can’t tell if they are in cis [on the same chromosome] or trans [on the opposite chromosome]. If you can’t confidently assign phase for two sequences, there’s no way you can do it across an entire genome.”
Why is it important to determine phase? “In the past, people treated genes as if they were cookie cutouts – if you identified the start and stop codons, all the important parts of the gene lay between them. But we now know that regulatory DNA elements coding for regulatory RNAs that influence the gene’s function often occur remotely, up to 500 kilobases upstream from the gene’s promoter, and up to 400 kilobases downstream from the poly(A) [polyadenylation] tail.
“Researchers knew about these ‘enhancer elements’ in the early 1970s, and knew that they lay at a considerable remove from the genes whose expression they enhanced. Today we have learned a great deal about cis and trans effects – genes don’t just act in cis. Counter intuitively, the major effect on a gene’s activity can be via its trans interaction with the paired gene on the opposite chromosome.
“So you can never be confident that you know everything about the expression of one allele by looking at one chromosome. Understanding trans interactions is clinically important for a wide range of inherited disorders or susceptibilities.
“Around 2,200 articles have now been published on compound heterozygosity in inherited disorders. In cystic fibrosis, for example, there are hundreds of mutant alleles of the CF gene, and the particular combination of alleles influences the severity of the disorder.”
---PB---
Cleaving the knot
Simons argues that only way to be certain whether two alleles – or two different, interacting genes – are in cis or trans phase is to sequence both haploid genomes.
“Researchers imagine they can solve the phase problem by deep sequencing single DNA molecules – doing the same sequence over and over again – and applying algorithmic tweaks to determine haplotypes. But I recently discussed the problem with some researchers at an Australian Genome Research Facility seminar on single-molecule sequencing, and they conceded that ever-deeper sequencing wouldn’t work.
“An alternative for determining haplotypes and studying phase interactions is to sequence entire chromosomes from haploid cells: sperm, which are readily available from half the population, and oocytes, which aren’t readily available from the other half. But it seems easier to me to establish phase from paired chromosomes where you find them – from metaphase cells.”
This is the phase of mitosis where chromosomes align in the middle of the cell before dividing, making them an easier target. Simons has lodged provisional patents in the US on a suite of technologies for gene mapping and definitive haplotyping using haploid chromosomal DNA. He has been developing and testing the techniques with the help of researchers at Monash Medical Centre and the Australian Red Cross’ Victorian Transplantation and Immunogenetic Service labs.
The Haplomics approach involves isolating from cells single chromosomes that have been uniquely identified by fluorescence in-situ hybridisation (FISH) probes, and excised from the cells by laser – a technique Simons has dubbed ChromoPult. A second approach, which he calls ChromoSort, involves separating FISH-labelled chromosomes by microfluidic flow sorting.
After a chromosome’s several hundred-odd femtograms of DNA have been amplified by polymerase chain reaction into sufficient quantities, standard Sanger sequencing can be used to read the fragmented DNA.
However, Simons envisages that next-generation sequencing technologies, employing solid-phase oligonucleotide probe microarrays and single-molecule sequencing, will speed sequencing, even with relatively short read lengths of 50-plus nucleotides. Such multi-to-megabase sequencing platforms currently require as little as 10 nanograms of DNA, and picogram quantities will soon be feasible.
Simons is confident that single-chromosome amplification will soon bridge the gap between femtogram quantities of chromosomal DNA, and the nanogram amounts required for sequencing. Simple reassembly algorithms will then match up overlapping sequence data from fragments from several runs, to provide complete cis-phase sequences for single chromosomes, even from the genome’s most daunting terrain: the gene-dense MHC complex on chromosome 6.
Thus the problem of separating out haplotypes can be solved, he argues, by bypassing the whole, and doing things by half. Not unlike Alexander’s bold division of the knot, really.
This feature appeared in the July/August 2009 issue of Australian Life Scientist. To subscribe to the magazine, go here.
Why do our waistlines expand in middle age?
A new preclinical study highlights the importance of controlling new fat-cell formation to...
Anti-inflammatory drug may help treat alcohol use disorder
A drug that is already FDA-approved for treating inflammatory conditions may help reduce both...
Osteoarthritis study uncovers new genetic links, drug targets
The genome-wide association study (GWAS) uncovered over 900 genetic associations, more than 500...