Millions of single cells can now be analysed simultaneously
Spanish researchers have developed a sophisticated computational framework to analyse single-cell gene expression levels, scalable to process millions of individual cells. Their work marks the first time a tool has been capable of analysing such a large single-cell RNAseq dataset, dramatically extending the limits of single-cell genome research.
All the cells in a human body share the same genome but each cell has the potential to become specific in a tissue or organ due to gene expression. One of the current challenges of genome research is to analyse many individual cells in order to find and to identify what differentiates one cell from another.
The analysis of individual cells using single-cell RNA sequencing has already revolutionised our understanding of the complexity of tissues, organs and organisms. Looking at the gene expression of one cell at a time, scientists are now able to describe a sample’s heterogeneity at unprecedented resolution and without prior knowledge of its composition.
Large-scale single-cell projects have since led to the identification of previously unknown cell types and to the drawing of comprehensive cellular maps of organisms, such as the Human Cell Atlas project. However, such studies create massive amounts of sequencing data and analysing large datasets are a major challenge.
Scientists at the Centro Nacional de Análisis Genómico of the Centre for Genomic Regulation (CNAG-CRG), in collaboration with the University Pompeu Fabra and the Biomedical Research Consortium on Rare Diseases, have now developed an efficient computational framework known as ‘BigSCale’ that enables processing, analysis and interpretation of big-scale single-cell experiments.
“BigSCale is extremely powerful in identifying cell type specific genes, which greatly helps in the downstream interpretation of experiments,” said Dr Holger Heyn, CNAG-CRG team leader and senior author of the study, which has been published in the journal Genome Research.
The novelty of the BigSCale analytic tool lies in a numerical model that sensitively determines differences between single cells. Having charted how individual cells differ from each other, they can be grouped together into populations of cells to describe the cellular complexity of a given tissue. As virtually all tissues are composed of different cell types and subtypes, such an analysis can guide an unbiased in-depth characterisation without initial hypotheses. Differentially expressed marker genes between subpopulation help the researcher to link cells to prior knowledge about the tissue anatomy or to describe the functions of newly discovered cell types.
In addition, the tool was designed to tackle future challenges of large datasets, with Dr Heyn noting, “The costs to derive single-cell profiles are decreasing and we are seeing studies of increasing cell numbers.” In this regard, a module in the BigSCale workflow enables the analysis of millions of cells through a directed convolution strategy. Here, single-cell transcriptomes from similar cells are merged into index cells, greatly reducing the amount of data to be processed.
The group illustrated the power of their strategy by analysing one of the largest single-cell gene expression datasets: 1.3 million individual cells of the developing mouse brain, publicly available from 10x Genomics.
“BigSCale allowed us to look deep into the developmental processes of the mouse brain and to characterise even rare neuronal cell types,” stated first author Giovanni Iacono. Specifically, the high number of cells enabled the group to zoom into a small transient cell population called Cajal-Retzius cells and to describe major substructures related to distinct differentiation stages, spatial organisation and cellular function.
“The BigSCale framework provides a powerful solution for virtually any species and is even applicable outside the RNA sequencing context,” said Dr Heyn. “We expect it to contribute to the interpretation of large-scale studies, such as the Human Cell Atlas project.”
The iMD (interactive molecular dynamics) VR version of Nano Simbox, created by Interactive...
Most research on human genes only concentrates on around 2000 out of a pool of nearly 20,000 genes.
Inspired by how bees make collective decisions, researchers are exploring how crowdsourcing...