Sequencing the cancer genome

By Kate McDonald
Friday, 14 November, 2008

It is probably useful in this time of economic uncertainty, when many of the charities that fund medical research are watching their money disappear down the holes that have opened up in the banking system, to have something to fall back on.

Dr Peter Campbell, a researcher at the Wellcome Trust Sanger Institute in Cambridge, at least has his medical degree, complete with clinical training, to support himself if things go belly up.

That’s not likely to happen to the world’s largest charity, and Campbell also has a statistics degree if clinical haematology is not of interest. For the moment, he is working with the Institute’s renowned Cancer Genome Project, analysing the data that is being generated in enormous volumes from the project’s quite remarkable and prolific work on sequencing the genomes of tumours.

The main aim of the project, Campbell says, is to document and catalogue the mutational profile of human malignancies. Up until now, that has predominantly been done through medium-throughput sequencing of PCR products.

“There’s a laboratory pipeline for generating PCR products and automating the sequencing,” he says. “And we’ve got a series of informatics tools that analyse the capillary sequencing data with the aim of identifying somatically acquired variants. That’s been the major thrust of the project for the last five or six years – developing the tools that underpin that effort.”

The group has also made a substantial effort looking at copy number variations through the use of oligonucleotides microarrays, an effort that is nearing completion, and more recently the group has been investing heavily in new sequencing technologies and the application of those to cancer genomics.

One of those new technologies, Illumina’s Solexa sequencing platform, will be the topic of his talk to the Australasian Microarray and Associated Technologies Association’s (AMATA) conference in his hometown of Dunedin, New Zealand, in November. There, he will talk about how using genome-wide, massively parallel paired-end sequencing has caused a revolution in our understanding of the cancer genome.

In April this year, Campbell and his team published a paper in Nature Genetics reporting multiple germline structural variants and somatic rearrangements to the base-pair level of resolution in DNA from two individuals with lung cancer. “The results,” they wrote, “demonstrate the feasibility of systematic, genome-wide characterisation of rearrangements in complex human cancer genomes, raising the prospect of a new harvest of genes associated with cancer using this strategy.”

For Campbell, who has to analyse the data, the amount being extracted from these new technologies is phenomenal. “It’s perfectly set up for identifying rearrangements in DNA sequencing because you get millions and millions of sequences,” he says.

“At the moment we are getting 30 million sequences per run – you get 30 million sequences and you get 35 base pairs from either end of a fragment that can be up to 500 base-pairs long.

“We are now looking at using fragments that are up to 5kB long. If you randomly shear your genomic DNA so these fragments are randomly scattered across the genome, then 30 million reads give you three or four times coverage of the genome. So you should in theory have three or four of every rearrangement in that genome, and that’s only from a single run.”

The question then becomes, what do you do with all of that information and how do you make sense of it? “Ultimately, you are looking for recurrence,” he says. “For example, we are interested in finding fusion genes that are involved in a number of malignancies. If you see a chromosomal rearrangement or something that gives rise to the same fusion event in more than one cancer, then that gives you some confidence that what you are looking at is the genuine driver event rather than a random genetic change.”

---PB--- Drivers and passengers

In Nature in March last year, the team and its international colleagues, including some from Australia, looked at patterns of somatic mutation in human cancer genomes, examining more than 500 protein kinase genes in 210 different cancers. What they found was that while they were able to identify 1000 somatic mutations, most of those are what are called ‘passengers’ – those not likely to contribute to oncogenesis.

On the flip side, they found plenty – 120 in fact – that they think are ‘driver’ genes, and there were many more of them than expected. “If you went back a decade and tied your average clinical geneticist down, they’d say there were a handful, perhaps 10 or 20, that were responsible for the majority of genetic alterations driving malignancy,” Campbell says. “I think that paper has been quite important for refocusing that assumption.

“It is now generally accepted – this has been followed by a number of sequencing studies from other centres showing the same thing – that the spectrum of cancer genes is probably much wider than initially anticipated.”

Campbell’s team is also investigating the evolutionary history of individual cancers from individual patients using large scale genome sequencing. Publishing in Genome Research in August last year, the team reported its first exploration of large-scale genomic variations at the rearrangement level. This project used more traditional technology – developing a BAC library and sequencing the ends of each BAC, which were then sequenced by shotgun if it looked like a rearrangement had occurred.

“We knew from copy number changes that a lot of the genomic rearrangements must be fairly complex because the copy number changes were so wide, but that paper gave us the first insights into quite how wide it was and how chaotic it was.”

The team is also doing some ultra-deep sequencing, investigating the idea that cancer requires multiple mutations to generate a malignant phenotype. “If you have a situation where you have a non-selective mutation effect then eventually you will get all of these different clones, so that within a population of tumour cells you have a mix of subclones that show considerable genetic heterogeneity and that are competing with one another.

“In a sense, every patient’s cancer then becomes an independent experiment in Darwinian evolution. If you can understand the extent of that heterogeneity then you can begin to get closer to an understanding of both the biological processes that underpin the selection but also the mechanisms that lead to the triumph of one clone over another. That probably has quite a lot of therapeutic relevance.”

---PB--- 800 cancer cell lines

In yet another example of the Sanger Institute’s remarkably prolific output, in September they announced they had developed a catalogue of structural genomic changes in almost 800 cancer cell lines, using Affymetrix’s genome-wide human SNP array.

This set of samples contains most types of human cancer and the team was able to increase the resolution enormously over previous maps. Campbell says the aim of the core cell lines project is to document exactly what their genomes look like, information that will be released through a website called Cosmos.

“So if people are interested in particular pathways or therapeutic applications, they can understand the nature of the cell line they are looking at.”

In addition, the Institute is involved in the International Cancer Genomics Consortium, with the aim of achieving wholesale, comprehensive analysis of about 500 different samples from each of 50 tumour types. Australia is involved in this project, with the NHMRC committing to backing one of the tumour-type projects. An announcement of which type selected is expected shortly.

At the AMATA conference, Campbell discussed the pair-end sequencing project but also took a look into the future. “The future – five, ten years down the track – it is likely that we will be sequencing entire genomes pretty much at will,” he says.

“The sequencing companies – if you believe the figures they are talking about – all of their predictions to date have more than come true. Then we will be able to sequence an entire genome, getting 20 to 30 times coverage of every single base in the genome for about US$10,000.

“It then becomes feasible to take 100, 200, 500 cancer samples and sequence all of them and then you will know where the most commonly mutated genes are, what they are, what the variants are, and what pathways are intimately involved. That’s where we’re headed.”

Sequencing the cancer genome

Could this psychedelic compound reduce post-concussion symptoms?

Turning point — optimal antibiotics for golden staph bloodstream infections

Alloy implants that naturally dissolve after healing

Content from other channels on our network