Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights

1Genomics Research Centre, Human Technopole, Milan, Italy
2Health Data Science Research Centre, Human Technopole, Milan, Italy
3Department of Mathematics, Politecnico di Milano, Italy
Email: {sara.ometto, soumick.chatterjee, craig.glastonbury}@fht.org
medRxiv 10.1101/2024.11.04.24316700

*Indicates Equal Contribution
MY ALT TEXT

Project overview

Abstract

Biobank-scale imaging provides a unique opportunity to characterise structural and functional cardiac phenotypes and how they relate to disease outcomes. However, deriving specific phenotypes from MRI data requires time-consuming expert annotation, limiting scalability and does not exploit how information-dense such image acquisitions are. In this study, we applied a 3D diffusion autoencoder to temporally resolved cardiac Magnetic Resonance Imaging (MRI) data from 71,021 UK Biobank participants to derive latent phenotypes representing the human heart in motion. These phenotypes were reproducible, heritable (h2 = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (P = 8.5 × 10-29) and myocardial infarction (P = 3.7 × 10-12). By using latent space manipulation techniques, we directly interpreted and visualised what specific latent phenotypes were capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (P < 2.3 × 10-9) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H4 > 0.8) linked these variants to various cardiac traits and diseases, revealing a shared genetic architecture spanning phenotypic scales. Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases, and high-risk individuals had substantially increased cumulative hazard rates across a range of diseases. This study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery, and disease risk prediction using cardiac MRI imaging data.


MY ALT TEXT

Associations between latent phenotypes and diseases (A) and continuous traits (B): volcano plot of effect sizes and significance

MY ALT TEXT

GWAS results for 182 DiffAE latent phenotypes. A: Manhattan plot summarising genome-wide associations across 182 latent phenotypes. B: Dotplot showing the categories of traits previously associated with our lead variants, according to the GWAS Catalog

Latent Manipulation

To interpret the semantic concept encoded in each latent phenotype and visualise how it affects cardiac structure, we employed latent space arithmetic techniques based on manipulation in the embedding space. We trained sparse linear and logistic regression models with L1 penalties, targeting multiple cardiac traits and diseases. By utilising the non-zero coefficients learned by these sparse linear models, we selected, via shrinkage, the specific latent phenotypes responsible for encoding a given cardiac trait or outcome. Therefore, by manipulating these specific subsets of latent phenotypes and decoding manipulated reconstructions of z, we can directly interpret on the scan what those latent phenotypes are capturing in relation to the heart's structure and function. We performed latent manipulation for a certain target phenotype and latent factor z = (zsem, xT) following the formula: z̃semnorm = zsemnorm + m √D β, where m denotes the manipulation magnitude, D the dimension of zsem (128 in our case), β the normalised regression coefficients for the considered target (possibly conditioned); the superscript norm indicates standard scaled vectors. The manipulated latent variable, z̃ = (z̃sem, xT), was decoded, and the reconstruction was visually inspected. We repeated the latent manipulation process for different values of m, z, and targets.

BibTeX


          @article{Ometto2024.11.04.24316700,
            author       = {Ometto, Sara and Chatterjee, Soumick and Vergani, Andrea Mario and 
                            Landini, Arianna and Sharapov, Sodbo and Giacopuzzi, Edoardo and 
                            Visconti, Alessia and Bianchi, Emanuele and Santonastaso, Federica and 
                            Soda, Emanuel M and Cisternino, Francesco and Ieva, Francesca and 
                            Di Angelantonio, Emanuele and Pirastu, Nicola and Glastonbury, Craig A},
            title        = {Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders 
                            reveals novel genetic insights},
            elocation-id = {2024.11.04.24316700},
            year         = {2024},
            doi          = {10.1101/2024.11.04.24316700},
            publisher    = {Cold Spring Harbor Laboratory Press},
            url          = {https://www.medrxiv.org/content/early/2024/11/05/2024.11.04.24316700},
            journal      = {medRxiv}
          }