Biomedical research — why are potentially important genes ignored?
While it’s previously been recognised that most research on human genes only concentrates on around 2000 out of a pool of nearly 20,000 genes, the reasons behind the ignorance are largely unknown.
While it’s known that most research on human genes only concentrates on around 2000 out of a pool of nearly 20,000 genes, the reasons behind the ignorance are largely unknown.
To find out why, a team from Northwestern University (NU), led by Thomas Stoeger and Luís Amaral, compiled 36 distinct resources describing various aspects of biomedical research and analysed the large database for answers. The number of publications on individual genes, the year of the first publication about them, the extent of funding and the existence of related drugs can be predicted using machine learning methods, according to the team. Their findings have been published in PLOS Biology (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2006643).
But why do biomedical researchers keep studying the same 10% of genes? There are several distinct factors, which all favour research on well-established topics, said Stoeger. “For instance, justifying future research plans is easier if it is possible to draw a more specific picture of the anticipated outcome of a study. Being able to build upon prior knowledge, and to use existing reagents, or to participate in focused conferences, can facilitate future research and save time for individual investigations. Along these lines we find that individual researchers have more successful careers if they avoid the least studied genes.”
So, basically, historical bias, bolstered by research funding mechanisms and social force, is a key reason why biomedical researchers continue to study the same 10% of genes while ignoring many genes known to play roles in disease. Biomedical research is primarily guided by a handful of generic chemical and biological characteristics of genes, which facilitated experimentation during the 1980s and 1990s, rather than the physiological importance of individual genes or their relevance to human disease, according to the team.
The Human Genome Project
The Human Genome Project — the identification and mapping of all human genes, completed in 2003 — promised to expand the scope of scientific study beyond the small group of genes scientists had studied since the 1980s. But the Northwestern researchers found that 30% of all genes have never been the focus of a scientific study and less than 10% of genes are the subject of more than 90% of published papers. And this despite the increasing availability of new techniques to study and characterise genes.
“The Human Genome Project has allowed a study such as ours, and it has promoted several large-scale studies, which have already identified interesting starting points for closer studies,” said Stoeger. However, presently we do not sufficiently promote the latter, he said. The Northwestern team’s observations are restricted to biology and show that there is huge potential for further discoveries. However, there is no reason to believe that the underlying processes are specific to biology, noted Stoeger.
With researchers focused on just 2000 human genes, the biology encoded by the remaining 18,000 genes is largely uncharacterised. Some of these genes, the researchers note, include an understudied breast cancer gene cluster and genes connected to lung cancer that could be at least as important as well-studied genes.
Stoeger and others (https://www.nature.com/articles/s41598-018-19333-x) have found examples in many of the distinct processes studied by genome-wide association study (GWAS). “Other cases that I, personally, find very interesting are unstudied genes relating to human growth, or Inflammatory Bowel Disease.” Another example is C9orf72, a gene relevant to ALS that has not been widely studied before but is now attracting significant research attention.
Stoeger participated in the first genome-wide tissue-specific RNAi screen in Drosophila during his undergraduate studies. That helped him understand that several genes, which were important for proper development of tissues, would not have been studied before. (https://www.ncbi.nlm.nih.gov/pubmed/19363474 ). When Stoegar was pursuing his PhD, there was a limited budget to select candidate genes. Hence, he actively exploited biases in past research patterns to select candidate genes in a manner that would maximise the chances of timely feasible follow-up experiments, and the chance to appeal to many readers and create a high-impact publication.
Knowing the unknown
“Since I believed that the unknown may strongly influence research and the current understanding of human health in ways that are presently only marginally understood, it seemed very important to understand why researchers would study certain genes and not others, and whether there might already be the potential for moving toward the understudied areas of biology. Since genes are perturbable, and represent a non-abstract entity, focusing on genes would allow to bridge meta-studies of science with useful gene-specific advise that could be exploited by individual researchers, and help them to address the underlying bias.”
The Northwestern team anticipates that it will take 20 more years until half of the protein coding genes are being studied by focused research publications. “Since this would mark a low threshold for ‘understanding’ it could take several further decades until their physiological and molecular roles are truly understood,” said Stoeger.
Stoeger and his team show that large-scale efforts, which disrupt the function of individual genes, have already identified several largely uncharacterised, and under-characterised, genes, which are important to biology.
Further, the team show that many of these genes could already be studied by current technologies. “Biology is in the extremely lucky and exciting situation that — because since the human genome project we know the catalogue of human protein-coding genes — we could use data to identify gaps in our knowledge. One promising direction to mitigate the present situation would be to use data to provide extra freedom and security to those investigators that avoid the most-studied genes and hence do not profit from the same incentives as the researchers that study the 10% most-studied genes.”
Stoeger collaborated with Professor Luís Amaral in the department of chemical and biological engineering in Northwestern’s McCormick School of Engineering. Stoeger, Amaral, postdoctoral fellow Martin Gerlach and Richard I Morimoto, the Bill and Gayle Cook Professor of Molecular Biosciences in Northwestern’s Weinberg College of Arts and Sciences, conducted the study.
The researchers applied a systems approach to the data — which included chemical, physical, biological, historical and experimental data — to uncover underlying patterns. In addition to explaining why some genes are not studied, they can explain the level to which an individual gene is studied. And they can do that for approximately 15,000 genes. “Social forces and funding mechanisms reinforce a focus of present-day science on past research topics.”
A public resource
“In order to accelerate the pace of discovery, we propose the need for funding mechanisms of scientists and calls for proposals that encourage the pursuit of nonredundant and likely highly unpredictable research directions,” the researchers wrote in the PLOS Biology paper. “In order to counter the career forces currently pushing towards conformity, there would be a need for stable, long-term support for such innovators to focus on the unknown. Just as the Royal Society sponsored target studies of the unknown with an eye towards the economic potential of certain discoveries, we also predict that exploring the uncharted territories of unknown biology by investigating unstudied and understudied genes will yield satisfying observations that would contribute economically and medically. We believe that the resource presented here provides a jumping point for further systems-level investigation on the formation of scientific knowledge ... and a guide to researchers who want to identify promising but little-studied genes.”
Looking forward, the Northwestern team is developing a public resource that could help identify understudied genes that have the potential to be of critical importance to specific diseases. The resource includes information on any extraordinary chemical property, if a gene is highly active in a specific tissue and if there is a strong link to a disease.
The research was supported by the National Science Foundation, the Department of Defense’s Army Research Office, the National Institute of Aging, the National Institute of Allergy and Infectious Diseases, the Simons Foundation, the Daniel F. and Ada L. Rice Foundation and a gift from John and Leslie McQuown.
Adelaide researchers are creating machines capable of undertaking complex tasks, acknowledging...
When it's time to move biotechnology breakthroughs towards commercialisation, specific...
The iMD (interactive molecular dynamics) VR version of Nano Simbox, created by Interactive...