In the past decade, multimodal neuroimaging and genomic techniques have been increasingly developed. As an interdisciplinary topic, brain imaging genomics is devoted to evaluating and characterizing genetic variants in individuals that influence phenotypic measures derived from structural and functional brain imaging. This technique is capable of revealing the complex mechanisms by macroscopic intermediates from the genetic level to cognition and psychiatric disorders in humans. It is well known that machine learning is a powerful tool in the data-driven association studies, which can fully utilize priori knowledge (intercorrelated structure information among imaging and genetic data) for association modelling. In addition, the association study is able to find the association between risk genes and brain structure or function so that a better mechanistic understanding of behaviors or disordered brain functions is explored. The research team of Prof. Zhang Dao-Qiang (Nanjing University of Aeronautics and Astronautics) reviewed the related background and fundamental work in imaging genomics. Then, the paper shows the univariate learning approaches for association analysis, summarizes the main idea and modelling in genetic-imaging association studies based on multivariate machine learning, and presents methods for joint association analysis and outcome prediction. Finally, this paper discusses some prospects for future work.
In recent years, with the development of cognitive neuroscience, neuroimaging has brought new vitality to the study of the working mechanism of the human brain. At the same time, with the development of noninvasive brain imaging technology, researchers hope to gain new insights into the imaging characteristics and molecular mechanisms of the brain, as well as their impact on normal and disordered brain function and behavior. Commonly used brain imaging techniques include structural magnetic resonance imaging (sMRI), functional magnetic resonance imaging (fMRI), diffusion tensor imaging (DTI), and positron emission tomography imaging (PET). In addition, with the development of genetic technology, researchers can identify genetic markers associated with neurological and psychiatric diseases from a more refined molecular level (such as single nucleotide polymorphisms (SNPs)).
As an emerging data science, brain imaging genomics has achieved rapid growth, which is greatly attributed to the public availability of valuable imaging and genomics datasets. Due to the open-science nature of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) project, hundreds of publications using ADNI imaging genomics data have been produced in the past decade, yielding innovative machine learning methods and novel biomedical discoveries. Similar to the ADNI, an increasing number of landmark studies are producing big data, including multi-dimensional imaging and omics modalities, making them available to the research community. These include the Enhancing Neuro Imaging Genetics through Meta Analysis (ENIGMA) Consortium, Philadelphia Neurodevelopmental Cohort (PNC) and Parkinson’s Progression Markers Initiative (PPMI).
Brain imaging genomics mainly uses brain imaging technology to evaluate the genetic influence on individuals by using brain structure and function as phenotypes, and explores how genes affect the neural structure and function of the brain, as well as the resulting neurological pathology. Studying the association between genetics and brain structure and function, and building a visible bridge between “genes and brain”, can better reveal the pathogenesis of neuropsychiatric disease. Imaging genomics can also identify biological indicators or endophenotypes of a brain disease, which provides a more accurate method for predicting and diagnosing the disease. Specifically, most researches consider SNPs as genotype data for association analysis. In the acquisition of endophenotypic data, researchers mostly use brain imaging data (i.e., MRI) in clinic for analysis. For example, sMRI, an imaging technique that measures the structural organization of the brain, can quantify abnormalities in morphology (i.e., gray matter volume). fMRI scans have been shown to be effective in revealing functional connectivity patterns of the brain. Based on different modalities of brain imaging technology, at present, imaging genomics mainly focuses on the association analysis between gene SNPs and brain structure, function, and connectivity.
Early imaging genomics is a univariate paired statistical analysis methods, where multiple tests are employed to find the association between SNPs or genes and complex diseases or measurable quantitative traits (QTs). Genome-wide association study (GWAS) uses the whole genome high-throughput sequencing technology to classify the sequence variation in the genome of the research object, and finally selects significant SNPs via the biostatistics methods and bioinformatics methods. Since the first GWAS research paper on age-related macular degeneration published in Science in 2005, this method has been used in the analysis of psychiatric disorders. GWAS has played a great role in the study of imaging genomics, but there are also some problems, such as strict multiple correction, so that many small effect variants cannot pass the correction level. In addition, GWAS can only obtain a single degree of association between genetic variation and traits, and cannot well explain the complex molecular mechanisms of the brain.
In recent years, with the rapid development of machine learning in academia and industry, researchers have tried to use these data analysis tools to solve some problems in many fields. In the association analysis of imaging genomics, in addition to univariate statistical analysis, the multivariate machine learning model is the most widely used, and it has identified disease-sensitive imaging and genetic biomarkers.
Internationally, some scholars have also written a review of related methods in imaging genomics. For example, Medland et al. have raised the problems and challenges of using traditional univariate statistical models to process large-scale genome-wide brain imaging association analysis, reviewing the research results in different central databases. Liu and Calhoun summarized the application of other multivariate methods such as independent component analysis in imaging genetics. Thompson et al. focused on the association analysis between genetics and brain structure connectivity and functional networks. Based on the above review works, this article is devoted to providing comprehensive and up-to-date coverage of machine learning methods in brain imaging genomics.
Fig.1 is adopted to present a schematic of the topics covered in brain imaging genomics. One of the main goals of imaging genomics based on machine learning is to realize association analysis studies for understanding mechanisms and pathways. This paper groups these imaging genomics based on machine learning methods into two categories.
The first category mainly uses regression models to identify complex multi-SNP and/or multi-QT associations. Most of the regression models can usually be described using the regularized loss function framework. A sparsity-inducing regularization term is often included in these models. The motivations are twofold. First, it is reasonable to hypothesize that only a small number of markers are relevant in the resulting imaging genomics association. The sparsity term can help identify these relevant markers. Second, the sparsity constraint can reduce the model complexity and subsequently reduce the risk of overfitting.
In addition to regression models, another category of prominent methods developed for brain imaging genomics studies are correlation models, such as sparse canonical correlation analysis (SCCA) and parallel-independent component analysis (pICA). Similar to the regression model discussed earlier, the sparsity is encouraged in these correlation models to reduce model complexity and the risk of overfitting, as well as identify relevant biomarkers.
Overall, this article is focused on the three types of learning problems as follows. First, this paper will show the limitations of the univariate imaging genetics association analysis and show the univariate learning approaches for correlation analysis. Second, this paper will present the problem of multivariate imaging genetics association analysis and summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning. Third, this paper will review methods that are used to predict an outcome of interest by combining both imaging and genomics data, and methods for joint association analysis and outcome prediction. Finally, some unsolved problems in genetic imaging and future research directions are prospected.
Download full text：
Machine Learning for Brian Imaging Genomics Methods: A Review
Mei-Ling Wang, Wei Shao, Xiao-Ke Hao, Dao-Qiang Zhang