Searching the gene database? Expect a wait, says expert

By Pete Young
Tuesday, 23 July, 2002

Hour-long waits for results from gene and protein database searches that now take a few minutes could become reality within a few years, according to a search engine expert.

That scenario is inevitable if more efficient search algorithms aren't applied, says Dr Hugh Williams, head of a software research group at RMIT's School of Computer Science and Information Technology.

Williams and his team are moving to commercialise a more efficient search engine, named Cafe, which has been given pre-seed funding by RMIT but is now looking for outside investors.

Their work could be important because DNA sequence databases such as GenBank are doubling in size every 13 months. At that rate they are outpacing the capabilities of commonly-used search algorithms like BLAST.

"I believe current search techniques as unsustainable," says Williams.

Sequence database searches that took 10 seconds to produce answers three years ago are now taking several minutes, he says.

Simple extrapolation suggests those times could blow out to half an hour or an hour in another three years unless corrective measures are applied.

Williams believes his team will contribute to the solution.

In the past 12 years, it has built up an international reputation for its search engine prowess. One leading internet search engine company, Google, trolls for new recruits on the RMIT campus and regularly offer jobs to members of Williams' team, he says.

Its specialty lies in building faster, more efficient engines and addressing scalability issues so database growth doesn't translate into longer search times.

"Our skills lie with speeding up the search process using data compression and algorithms designed to process data in faster ways."

About seven years ago, the group began applying the techniques it originally developed for Google to what it saw as the fruitful area of protein and DNA databases.

Williams says search algorithms like BLAST treat databases as one huge text file and don't scale well, meaning they become rapidly less efficient as database sizes increase.

Nor is faster hardware the answer. Some observers believe gene and protein information databases are exploding faster than Moore's Law (governing the growth of affordable processing power) can handle.

The techniques developed by Williams' group will make searches 100 times faster than current algorithms permit and will scale up as databases continue to expand.

Content from other channels on our network

From waste to worth: an update on ARRC

Open EOI released for waste-to-energy procurement process

Community mistrust puts renewable energy rollout at risk

Aussie low-carbon concrete created using calcinated clay

10 lessons found to improve urban sustainability experiments

Virtual PLCs — a big step forward!

IFR releases position paper on AI in robotics

Australian Vanadium partners with Sumitomo on Kalgoorlie battery bid

Investment in large‍-‍scale renewables surging ahead: report

Open Process Automation: how and where to start

Tip Top Bakeries doubles bread production capacity in WA

Does cultured beef have fewer allergens?

A pinch of saltbush for functional and nutritional benefits

Webinar about PFAS restrictions for food-contact packaging

Unlocking the benefits of avocados across the value chain

Searching the gene database? Expect a wait, says expert

The University of Sydney formalises cervical cancer elimination partnership

Noxopharm says paper reveals science behind its immune system platform

Neurosensing/neurostimulation implants session to be held on Monday

Content from other channels on our network