Feature: Quest for the human proteome

By Tim Dean
Tuesday, 14 September, 2010

This feature appeared in the July/August 2010 issue of Australian Life Scientist. To subscribe to the magazine, go here.

Mapping the human genome, and its 21,000-odd protein-coding genes, was a mammoth undertaking, requiring a 10 year, multi-billion dollar, collaborative effort by teams from across the globe. Yet the Human Genome Project (HGP) was only the beginning.

The human genome is just the blueprint for the legions of proteins that are encoded by those genes, and it’s these proteins that are the business end of biology; if the genome is the plan, the proteins are the actual building blocks of life.

In Sydney in September, at the Human Proteome Organisation (HUPO) Annual World Congress, the next great undertaking in uncovering the foundations of human biology will begin, with the official launch of the Human Proteome Project (HPP).

It’s no accident that it will be launched in Sydney, says Professor Mark Baker, Chair of Proteomics at Macquarie University and co-chair of the HUPO 2010 committee. After years of discussions at various HUPO meetings, a consensus was reached that it was time to get a Human Proteome Project underway. “Then, of course, we realised we had come to the stage where we had to launch it officially,” he says.

“Because the word ‘proteome’ was defined here in Sydney, John Bergeron – who has been one of the leaders and is an ex-president of HUPO – thought it’d be great to launch it out of the birthplace of the proteome.”

It is hoped that the HPP will be a key stepping stone on the way to personalised medicine, says Professor Edouard Nice, who has a joint Monash University/Ludwig Institute for Cancer Research appointment in Melbourne. He is also co-chair of the HUPO 2010 meet. “The data that the Human Proteome Project will generate will be of significant assistance in being able do the types of analyses that will be required for personalised medicine,” he says.

“There are a number of diseases where improved early detection or surveillance will be greatly beneficial to the patient. A number of these diseases are heterogenous, so you have to tailor your treatment to the actual disease itself. The ability to rapidly analyse those criteria will be fundamental to advances in medical research and disease treatment.”

Human proteome

The ambitious goal of the HPP is to generate a comprehensive map of each of the 21,000 or so proteins that are encoded by the human genome. The plan isn’t just to identify each of these proteins, but to provide detailed information on their function, abundance, sub-cellular localisation as well as characterise their various interactions.

There are a lot of gaps to fill in this picture, despite well over a decade of research into proteomics. “As of now, 7000 to 8000 – or roughly one-third – of the genes uncovered by the HGP are not linked in to any proteins, and of the remaining two-thirds, many lack detailed information beyond their mere existence,” says Pierre Legrain, Commissariat à l’Energie Atomique in France, and Secretary General of HUPO – a driving force behind the HPP.

---PB---

“This is probably due to the lack of sensitivity of mass spectroscopy for very low abundance proteins – or proteins expressed in very few cells, rarely studied samples or in an atypical time frames – but most probably also due to the absence of accurate screening for all available mass spectroscopy data,” Legrain says.

“The HPP will provide the means and the tools to address those points and increase the probability that at least one protein for each human gene will be characterised in the frame of HPP.”

All this, and there’s a 10-year deadline, with the first draft of the completed human proteome expected in 2020.

Complicating the endeavour is the fact that, unlike the human genome, the human proteome is ever-changing – it’s a ‘moving target’. Proteins aren’t static things. They have a canny habit of changing their guise in response to a whole host of factors. Or, as Baker puts it, proteins like to get ‘dressed up’.

“They wear earrings, they wear glasses, they get modified to do particular activities that only can be done in the modified format. They get decorated, so to speak,” he says.

These ‘decorations’ are post-translational modifications, including things like phosphorylation, sialylation or acetylation, or structural changes like cleavage or bridging.

“Part of understanding how proteins work is understanding which form is the protein is the active form. So you might have a particular protein, but unless you put a phosphate group on a particular tyrosine, it’s not active,” Legrain says

Needless to say, this complicates the prospect of bringing the human proteome under a single umbrella. Suddenly 21,000 proteins becomes many times that number, including all their various permutations, not to mention adding in things like localisation or interaction data.

Bumpy road to consensus

It’s for this reason that it took several years for HUPO to build consensus around the HPP. Not only was there controversy over the best way to tackle such a mammoth and multifaceted undertaking, but there was question as to whether a centrally-directed effort was even necessary.

After all, there were dozens of labs around the world already beavering away at the human proteome, populating databases such as PRIDE, Swiss-Prot/UniProtKB and the ProteinAtlas.

Yet, many of these labs were approaching the work of mapping the proteome from different angles, using diverse techniques and releasing their results in disparate – often incommensurable – formats.

The volume of data being produced was prodigious, but weaving it all together into a coherent picture was proving to be a monumental challenge. And in its fragmented form, the data was of little use to the broader scientific or medical research community.

---PB---

It was with the intention of bringing order to the increasingly chaotic world of proteomics that HUPO was created in 2001. By establishing common frameworks and standards, the world’s proteomics community could begin speaking the same language.

As stated in a Nature editorial in 2005: “Without the umbrella of HUPO, hopes for standardisation in proteomics would have been bleak, with researchers being more inclined to use their rivals’ toothbrushes than their protocols. HUPO is involving the entire international community in its discussions to ensure consensus, and has already brokered a surprising number of agreements, with journals ready to assist in enforcing standards.”

The HPP can thus be seen as the logical conclusion of this process, the weaving together of the various individual projects, the plethora of databases addressing different aspects of proteins, and grounding it all in the human genome.

The turning point for the HPP came with a white paper in August 2008 following intense discussion the 4th International Barbados Proteomics Conference earlier that year. This white paper discussed the various approaches to the HPP and how they might knit them together under a common banner.

The white paper galvanised support for the vision of a HPP, although the question remained whether the HPP should take a broadly protein-centric approach or a gene-centric approach. This was a crucial question because it would establish the starting point and the direction of the project. Another year-and-a-half of debate ensued.

In the blue corner, Denis Hochstrasser, from Geneva University and University Hospital, argued that a HPP needed to take a protein-centric approach if it was to capture all the various permutations of the proteins involved in human biology.

Given that disease is caused not only by genes but also by toxicants and microbes, a gene-centric approach “would capture only a fraction of the changes that can occur in human disease,” he wrote in the Journal of Proteome Research in 2008.

In the red corner was John Bergeron, from McGill University in Canada, who lobbied for a gene-centric view, grounding the human proteome in the human genome and building up from there.

He argued that by starting with the genome, the HPP could start with a finite set of proteins as a solid foundation and build upon that in order to capture the vast complexity of the complete human proteome.

Eventually the gene-centric view won out and the goals of the HPP were outlined in a HUPO Views article published in Molecular & Cellular Proteomics in February this year.

Settling on the gene-centric approach won’t satisfy everyone, says Nice, but it does give a solid foundation upon which to base the HPP. “It’s about the art of the feasible,” he says. “You start off with the genome, and everything else will branch off from there.”

Three-pronged approach

The HUPO Views paper also laid out the three-pronged approach that will be taken by the HPP. First is a protein parts list, which involves “the identification and characterisation of at least one representative protein from every human gene with its abundance and major modifications.

This would define the backbone of a human proteome encyclopaedia”. Second is a protein distribution atlas of these proteins, including sub-cellular localisation data. The third is a protein pathway and network map to begin fleshing out the interactome, with data eventually extending to nucleic acids, lipids and other molecules.

---PB---

Both antibody-based profiling and mass spectrometry will be used to identify and characterise these proteins, with the two approaches being complementary. “Both are fundamental, both have positives and negatives,” says Nice.

“One of the things that’s been talked about a lot is developing reagents that are well characterised and validated, which is fundamental for the antibody approach.” However, reagents are currently time consuming and expensive to come by, says Nice.

“With mass spectrometry, on the other hand, once you have the methods established, it’s essentially a generic method.” Once you have the mass spec data for the protein or peptide you’re measuring, and you have the instrumentation in place, it’s very cheap.

However, says Nice, currently it lacks the high throughput and the absolute sensitivity of the antibody-based approach. “They both have pros and cons, but they’re complementary.”

One of the major challenges in bringing all these elements together is in managing the vast quantity – and diversity – of data, and making it all compatible. Even labs using the same techniques and performing the same experiments can yield different data – as was shown in a 2009 Nature Methods paper produced by the team at McGill University. Establishing firm standards of sample preparation, methodology and data standardisation will be essential, especially given the distributed nature of the HPP.

HUPO is currently working hard to build upon its earlier efforts to maintain standard procedures to ensure that everyone’s talking the same language. “There are a number of HUPO initiatives that have started already in relation to data,” says Baker.

“Some look at quality, some look at validation, some look at how we store it and the language that it’s in. Getting people to contribute in the correct format so it can be re-mined by others, that’s a big challenge.”

One approach taken by HUPO is to speak to the leading journal editors to encourage them to make sure that papers are submitted using the appropriate data standards, says Baker.

---PB---

Outcomes

The data generated by the HPP – which will be made freely available to anyone, as with the HGP – should make a substantial contribution to the future of personalised medicine. Because, after all, it’s proteins, not genes, that are the key to understanding disease and health, says Legrain.

“One should always keep in mind that proteins are the main contributors to phenotypes, and diseases deal mostly with phenotypes, not genotypes! By better characterising human proteins, we are closer to the real biology and pathologies than with genes and genomic sequences. Thus any improvement in medicine – including in any molecular medicine – is dependant on improved knowledge on proteins.”

Baker also foresees a significant impact on human health from the HPP through personalised medicine. “What we’re going to end up with is new drugs and new targets that are much more specific to the individual, and the diseases that individual has,” he says.

“Instead of the very crude effects and crude drug approaches that we have now, we’ll have much more personalised medicine. We’ll also know a lot more about pathology of disease, and we’ll understand a lot more about health. We’ll have markers not just for being sick, but markers for health.”

In the end it’s expected we’ll see a first draft of the human proteome in 2020, although this won’t be the end of the process by any means. Once the foundations have been laid – characterising the proteins encoded by the human genome – there’s still the house to be built.

There are still all the various permutations of these proteins, as well as proteins from other sources that impact on human health, that need to be explored. But it’s the HPP that will lay the foundation for these future great endeavours in human biology and health.

This feature appeared in the July/August 2010 issue of Australian Life Scientist. To subscribe to the magazine, go here.

Feature: Quest for the human proteome

Human proteome

Bumpy road to consensus

Three-pronged approach

Outcomes

'Anti-reward' brain network helps explain cocaine addiction

Intense grief linked to higher risk of death for a decade

COVID vaccine candidate protects against multiple variants

Content from other channels on our network