The life science IT wish list

By Pete Young
Monday, 29 July, 2002

Researchers performing leading edge bioinformatics work are as likely to use a desktop machine as hold an account at a supercomputer centre. Whatever their hardware, the bio-IT wish list of most members of the research community is headed by the same item: software that is smarter on at least two counts.

The first count concerns more flexible programs (which require less effort to adjust) so researchers can analyse data in ways not envisaged by the program's original designers. On the second count, they want IT tools capable of linking together the continuously expanding new sources of data. Beyond that wish list looms the issue of exponential growth in the sequence data held in DNA and protein databases.

"My suspicion is that the sequence databases are expanding faster than Moore's Law (governing expected growth in computer processor power)," says Dr Mervyn Thomas, biotechnology manager of CSIRO's Mathematics and Information Sciences division (CMIS). In other words, the rate at which more raw computer power becomes affordable isn't keeping up with the pace at which databases are growing in size.

The only way to avoid a progressive slowing of response times to the large sequence databases will be to construct smarter search algorithms, a bio-IT area in which Australian researchers have been active, including a search engine group at Melbourne's RMIT School of Computer Science and Information Technology.

Skyrocketing datasets CMIS harbours the largest bioinformatics research capabilities of CSIRO's 22 divisions. It is focused primarily on functional genomics and automated image analysis using expertise in massively multi-variate data mining.

CMIS does not yet need supercomputer access to do its functional genomics research because the dataset volumes generated by gene expression studies don't warrant it. For example, a large gene expression study at the moment might consist of 350 Affymetrix gene chips, each holding 10,000 oligonucleotides (short, single-stranded DNA fragments), or 3.5 million datapoints. However, dataset volumes are poised to skyrocket as high-throughput technology realises its potential, Thomas predicts. "If I had to bet, I would bet we would be looking at a two orders of magnitude increase in throughput over the next two years in drug discovery studies," he tips. CMIS is positioning itself for the explosion by developing scalable algorithms for its functional genomics work, he says.

The largest hurdle facing CMIS lies with gaining early access to the large functional genomic studies which provide the raw material for its researchers. "We have the genomic data mining capacity to take on the world but there isn't the funding in Australia to generate the large data volumes that are coming out of the US."

Quality, not quantity At the Australian Genome Research Facility (AGRF), quality, not quantity, is the issue when it comes to bioinformatics. The AGRF acts as a service centre for bioinformatics researchers. It uses analysis software to perform measurements on biological material such as DNA fragments and relays that data back to its scientific clients who then crunch the numbers as they see fit.

"We would be happy if the software licences were all free, the software was completely developed and output results were instantaneous... but that is unrealistic," says AGRF Melbourne division manager Dr John Barlow. "The one thing we would most like to see change is the degree of maturity of fragment analysis software. It is okay but it could be improved."

Relieving the pressure One bioinformatics user under pressure to move to larger hardware platforms is Sydney's Garvan Institute. Its need for CPU (central processing unit) cycles is rising because of demands imposed by work involving statistical analysis of gene expression results.

Sun Microsystem machines running the stats packages are "taking several days to go to completion and that is hampering our ability to move the research forward," says Garvan IT manager Jim McBride. Lack of CPU cycles is also limiting its ability to do research in the area of looking for genes associated with certain types of diseases.

The work involves a computationally intensive statistical task assessing the likelihood of target genes existing at certain marker positions and "when we run it, it tends to dim the lights," McBride says. To relieve the pressure, Garvan is currently investigating the possibility of using facilities at Sydney's Australian Centre for Advanced Computation and Communications or at ANU's Supercomputing Facility.

Changing morphology Machine size is not a burning issue for Dr Patty Solomon, who leads the microarray analysis group at the ARC Special Research Centre for Molecular Genetics of Development. "The limiting factor is not hardware but expertise in bringing information together from different sources and making inferences," says Solomon, an associate professor with Adelaide University's maths department whose field is statistical analysis.

"Biologists have always been able to sequence specific gene products but until now they have never been confronted with the huge interface of maths and statistics and computer science. "They need collaborations with high-level stats and mathematical analysis specialists who can analyse their data.

"The whole morphology has changed, and people are still coming to grips with that." Minimising housekeeping The Centre for Bioinformation Science (CBiS) at ANU is one of the heaviest users of ANU's supercomputer facility. Senior researcher Dr Gavin Huttley, who coordinates much of that work, also sees the value of conducting research on in-house computing resources. For one thing, it avoid the large overheads in paperwork and time which can be imposed by the process of applying for space on the large, shared computing platforms.

He's interested in new hardware such as Apple's Xserver which promise to minimise the housekeeping effort required to keep it going."Systems designed to keep systems administration work to the lowest possible level are quite appealing," says Huttley.

"If we can free up people from things that don't contribute directly to our research output, that is an absolute win for us." A PROACTIVE APPROACH TO SOFTWARE

The ANU's Centre for Bioinformation Science (CBiS) is taking a proactive approach to the demand for better bio-IT software. It is currently negotiating a contract with Singapore bioinformatics company HeliXense to develop a software toolkit for genomics researchers.

The kit will build on concepts and work from CBiS researchers Dr Gavin Huttley, Dr Alexander Isaev and Dr Hilary Booth. The Java-based toolkit for genomics researchers involved in computationally-demanding work will allow them to pull together a desired application quickly from genomic-oriented software objects.

It is designed to break the developmental bottleneck currently facing researchers, Huttley said. Huttley said CBiS "would have loved to do this locally" but turned to Singapore's HeliXense because the project did not fit the business model of Australian bioinformatics suppliers. Specifications for the toolkit have been laid down, some functional prototypes are finished and the kit "won't take that long," to deliver, he said.

"We will build in some maths which will allow modelling of biological sequences in a way that most biologists will find intuitive." To meet the requirements of individual academic researchers, the kit will have a "very low entry price... ideally it would be free."

The three scientists are effectively donating their own IP. Under ANU's rules, the university is entitled to 70 per cent of the proceeds from CBiS half share in the project. The remaining 30 per cent will be split equally between Huttley, Isaev and Booth.

The life science IT wish list

'Directed evolution' used to design molecules in mammal cells

Novel drug candidates found for nerve pain and ischemic disease

Real-time sequencing helps combat golden staph infections

Content from other channels on our network