Supercomputers to the rescue

By Pete Young
Monday, 15 July, 2002

Bio-researchers are burning through about one-quarter of the total computational cycles at some of Australia's largest computing centres.

The statistic, which supercomputer centre directors say is still climbing, is one measure of the strides being made by computational biology.

But it is not sweeping evenly through the high performance sector. Many supercomputer facilities still don't cater at all for bioinformatics. But others pay special attention to it, most notably the Australian National University's Supercomputing Facility where the proportion of central processor usage devoted to bioscience research projects so far this year is running at a remarkable 47 per cent.

The demand by biotech researchers for high performance computing is being driven primarily by molecular modelling and genomics/proteomics projects. And the ongoing creation of new biotech institutes across the country suggests the number of computationally-intensive life science projects won't be levelling off soon.

Last year, bioscience accounted for about 24 per cent of the processor cycles at the national supercomputer facility of the Australian Partnership for Advanced Computing (APAC).

Sited in Canberra and managed by Dr Bob Gingold, head of the ANU Supercomputer Facility, APAC serves as a peak processor, software, support, mass storage, and visualisation systems resource which can be accessed by eight high performance computing centres around the country who are APAC partners.

They include CSIRO, the Queensland Parallel Supercomputing Foundation (QPSF), the Australian Centre for Advanced Computing and Communications in NSW, the Australian National University (ANU), the Western Australian Interactive Virtual Environments Centre, Tasmanian Partnership for Advanced Computing, Victorian Partnership for Advanced Computing (VPAC) and the South Australian Partnership for Advanced Computing.

Of the eight, the high performance computing platforms managed by ANU, VPAC and QPSF are putting the greatest focus on computational biology. Their systems show the heaviest bioscience-related workloads - particularly in projects related to molecular modelling and genomics/proteomics - and demand is growing rapidly.

Within ANU, bioscience researchers were mainly interested in macromolecular modelling projects and used more of the available supercomputer capacity than researchers from any other field. Their closest rival were the chemical sciences which accounted for just over half the demand generated by the biotech brigade. Gingold noted that many projects straddled both fields and projects with both chemical and biological components split equally for purposes of tracking central processor unit (CPU) usage.

Gingold expects the bioscience demand for ANU supercomputer resources to climb even higher as institutes such as the Centre for Bioinformation Science (CBiS) and Queensland Institute for Molecular Bioscience come up to speed.

The ANU high performance computing facility has made little effort to look for work in the private sector but would respond positively to initiatives from commercial companies, Gingold said.

He pointed out that ANU computing specialists have worked with computer giant Fujitsu for 15 years, optimising major chemistry packages for the IT vendor.

VPAC chief executive Prof Bill Appelbe estimated that life sciences research projects generate about 20 per cent of his centre's work with molecular modelling (computational drug design) accounting for the lion's share.

Limitations

VPAC faces two limitations in catering for increasing pressure from computational biology researchers. One is a maxed-out resource base. Its own machines are running at full capacity and it is using its full quota on the APAC peak system in Canberra. But VPAC is in the middle of a major expansion which could ease the current pinch by doubling or trebling the centre's capacity by the end of the year.

A second limitation is caused by the avalanche of new computational biology software packages available to researchers.

"There is a constant barrage of new products coming out and the difficulty facing people is how to tie all the information together," says Appelbe. "They are hitting the wall when they try to interface and interpret data drawn from multiple sources."

VPAC is involved in efforts to set up pools of expertise so researchers in areas such as drug design can share experiences about the most effective software packages and exchange troubleshooting advice.

Such working groups might invite collaboration from commercial software vendors who would benefit from user inputs on how best to package toolsets, Appelbe suggested.

"People are leery of just taking vendor recommendations," he said. That opens an opportunity for non-profit organisations such as VPAC and the Victorian Bioinformatics Consortium, who can dispense independent, knowledgeable, third-party advice to researchers grappling with software problems.

Hardware

On the hardware side, simply providing access to faster and larger computing platforms is not necessarily a useful approach as far as informational biologists are concerned.

As VPAC's Appelbe points out, no supercomputer centre in the world has the raw grunt to cope with dynamic modelling of large protein interactions.

Brute processing power is not the answer, agrees computational biologist Prof Mark Ragan, heads of the bioinformatics division of the Institute for Molecular Bioscience at the University of Queensland.

"Some of these things take you into a space that can be shown to be mathematically uncomputable. It is not that the computers aren't powerful enough but that the algorithmic space defines (the problem) as not solvable by force."

One way around the difficulty is to adopt an heuristic approach ("That does a pretty good job," said Ragan. "Near enough is good enough if you know how near 'near' is.") The other is to build cleverer algorithms that strip some of the complexities out of the task.

Ragan's research involves genomics and proteomics which throw up database issues that transcend mere number crunching. Specifically, he works with very large, very dense data sets and new data types that are not easy to integrate with each other.

"It is not just a string of DNA sequences but simulations, models and images that need to be integrated," That becomes a middleware problem involving both hardware and software, he says.

Like VPAC and ANU, the Queensland Parallel Supercomputing Foundation is finding about 20 to 25 percent of its computing capacity is taken up by bioresearch projects these days.

Queensland government investments in the biotech sector have spawned a dozen new research centres in the past two years. The activity is helping to send up demand for very high performance computing in the life sciences sector at more than a linear rate though it has not yet gone exponential, according to QPSF sources.

For many projects, biotech researchers don't require access to the nation's most powerful computing platform architectures.

Clusters of smaller platforms such as Intel-powered PCs or small Linux machines provide enough processing power to handle many projects.

"Just as in gardening, there are different tools for different jobs," says IMB's Ragan. "We have eight to 12 processor machines for internal development and 128-processor machines for things not big enough to send off to APAC.

"And we have other problems that need big single memory architectures."

Supercomputers to the rescue

Real-time sequencing helps combat golden staph infections

Single-cell sequencing capability boosted in South Australia

Biomaterial helps to reverse aging in the heart

Content from other channels on our network