New deep learning technique offers a more accurate approach to single-cell genomics
A new ‘deep learning’ method, DeepCpG, has been designed by researchers at the Wellcome Trust Sanger Institute, the European Bioinformatics Institute and the Babraham Institute to help scientists better understand the epigenome – the biochemical activity around the genome. Reported today (11 April) in Genome Biology, DeepCpG leverages ‘deep neural networks’, a multi-layered machine learning model inspired by the brain, and provides a valuable tool for research into health and disease.
As a result of projects like 1000 Genomes, scientists now have a ‘book’ of the human genome divided up into chapters and annotated in parts. However, to fully understand how life works, scientists need to decipher both the genome – the set of instructions repeated in every cell – and the epigenome, the part that varies wildly between cells.
To better understand how DNA sequences relate to biological changes, the genomics community is turning to artificial neural networks – a class of machine learning methods first introduced in the 1980s and inspired by the wiring of the brain. More recently, these models have been rebranded as ‘deep neural networks’, which form the field of deep learning.
Scientists have leveraged the capacity of deep learning to fill in the gaps in single-cell genomics, an emerging technology that offers a close-up view on epigenetics.
A new technique, DeepCpG, has been designed to help scientists learn about the connections between DNA sequences and DNA methylation – a biochemical modification of the genome sequence that can act like an off-switch for individual genes. Methylation plays a key part in important biological processes, including cell development, ageing and cancer progression.
The new method uses genomic and epigenomic data to make predictions about DNA methylation in single cells. This is important because current technologies provide incomplete information about this. With DeepCpG, researchers can obtain a more complete picture of DNA methylation. The model can also be used to obtain new biological insights, for example on the connection between the DNA sequence and methylation.
“DeepCpG actually learns meaningful features in a data-driven manner. It has major advantages over previous methods, including the ability to more accurately predict DNA methylation and to study intercellular differences. By studying the wiring of the learnt network, we can understand how the biology of DNA methylation works. This has allowed us to recover known DNA sequence motifs that are important for methylation changes, as well as to discover new motifs, which are the starting point for future studies.”
Christof Angermueller PhD candidate at EMBL-EBI
“We have demonstrated that DeepCpG enables us to accurately predict and analyse DNA methylation in single cells. However, DeepCpG is just one example of how we can apply deep learning to genomics and single-cell technologies. It is exciting to see the versatile applications deep learning has already found in genomics. I am looking forward to seeing more deep learning techniques come online. I believe it will make a big difference to how we study biology and has the potential to yield new answers about how life works.”
Dr Oliver Stegle Group Leader at EMBL-EBI
“Single cell epigenomics methods provide exciting insights into cell heterogeneity in development, ageing and disease; however if you are just dealing with two genomes in a single cell, bits of information are often lost during the experiment. This new method recognises patterns of the epigenome in single cells and then reconstructs lost information, returning a data-rich single cell epigenome.”
Professor Wolf Reikfrom The Babraham Institute and Associate Faculty member at the Wellcome Trust Sanger Institute
“Deep learning is now the state-of-the art in many fields. We are exploring its utility for making sense of large scale biological data. Pioneering studies Such as the one by Angermueller and colleagues, prove that there is lot to be gained by using deep learning methods in computational biology.”
Dr Leopold Parts, Group Leader at the Sanger Institute
More information
Notes to Editors:
In a review of deep learning for computational biology, Angermueller, Stegle and their colleagues present different applications of deep neural networks in computational biology. These range from models for understanding the impact of disease mutations to methods for localising and classifying cancer cells in microscopy images.
However, they also point out that deep learning is not the ultimate Swiss Army knife. Instead, the choice of whether to apply deep learning or conventional models depends on the nature of the data and the problem to be solved. Read more about publicly available software in the Molecular Systems Biology Review.
Source articles
Angermueller C, et al. (2016) Deep Learning for computational biology. Mol. Sys. Biol. 12:878; published online 19 July.
Nikhil Buduma. Deep Learning in a nutshell. Blog Post – http://nikhilbuduma.com/2014/12/29/deep-learning-in-a-nutshell/
Funding:
Oliver Stegle is supported by the European Molecular Biology Laboratory (EMBL), the Wellcome Trust and the European Union.
Wolf Reik is supported by the UK Biotechnology and Biological Sciences Research Council (BBSRC), the Wellcome Trust and the EU.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 635290.
Publications:
Selected websites
European Bioinformatics Institute (EMBL-EBI)
The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, analysis and dissemination of large biological datasets. EMBL-EBI helps scientists realise the potential of ‘big data’ by enhancing their ability to exploit complex information to make discoveries that benefit humankind. EMBL-EBI is at the forefront of computational biology research, with work spanning sequence analysis methods, multi-dimensional statistical analysis and data-driven biological discovery, from plant biology to mammalian development and disease. We are part of the European Molecular Biology Laboratory (EMBL), an international, innovative and interdisciplinary research organisation funded by 22 member states and two associate member states, and are located on the Wellcome Genome Campus, one of the world’s largest concentrations of scientific and technical expertise in genomics.
The Babraham Institute
The Babraham Institute, which receives strategic funding (a total of £27.3M in 2014-15) from the Biotechnology and Biological Sciences Research Council (BBSRC), undertakes international quality life sciences research to generate new knowledge of biological mechanisms underpinning ageing, development and the maintenance of health. The Institute’s research provides greater understanding of the biological events that underlie the normal functions of cells and the implication of failure or abnormalities in these processes. Research focuses on signalling and genome regulation, particularly the interplay between the two and how epigenetic signals can influence important physiological adaptations during the lifespan of an organism. By determining how the body reacts to dietary and environmental stimuli and manages microbial and viral interactions, we aim to improve wellbeing and healthier ageing.
The Wellcome Trust Sanger Institute
The Wellcome Trust Sanger Institute is one of the world’s leading genome centres. Through its ability to conduct research at scale, it is able to engage in bold and long-term exploratory projects that are designed to influence and empower medical science globally. Institute research findings, generated through its own research programmes and through its leading role in international consortia, are being used to develop new diagnostics and treatments for human disease.
Wellcome
Wellcome exists to improve health for everyone by helping great ideas to thrive. We’re a global charitable foundation, both politically and financially independent. We support scientists and researchers, take on big problems, fuel imaginations and spark debate.