Hundreds of cardiac MRI traits derived using 3D diffusion autoencoders share a common genetic architecture

1Human Technopole, Milan, Italy
2Department of Mathematics, Politecnico di Milano, Italy
3Department of Biomedical Sciences, Humanitas University, Milan, Italy
4RCCS Humanitas Research Hospital, Milan, Italy
Email: {sara.ometto, soumick.chatterjee, craig.glastonbury}@fht.org
Preprint will be available soon!

*Indicates Equal Contribution
MY ALT TEXT

Project overview

Abstract

Biobank-scale imaging provides an unprecedented opportunity to characterise thousands of organ phenotypes, how they vary in populations and how they relate to disease outcomes. However, deriving specific phenotypes from imaging data, such as Magnetic Resonance Imaging (MRI), requires time-consuming expert annotation, limiting scalability, and does not exploit how information-dense such image acquisitions are. In this study, we developed a 3D diffusion autoencoder to derive latent phenotypes from temporally resolved cardiac MRI data of 71,021 UK Biobank participants. These phenotypes were reproducible, heritable (h2 = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (P = 8.5 × 10-29) and myocardial infarction (P = 3.7 × 10-12). By using latent space manipulation techniques, we were able to learn, directly interpret and visualise what specific latent phenotypes are capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (P < 2.3 × 10-9) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H4 > 0.8) linked variants across phenotypic scales, from intermediate cardiac traits to cardiac disease endpoints. For example, rs142556838 that falls in CCDC141 colocalises with a latent imaging phenotype and a diastolic blood pressure locus. Using single-cell RNA-sequencing data we map CCDC141 expression specifically to a population of ventricular cardiomyocytes. Finally, Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases and enabled us to successfully stratify the individuals into different risk groups. In conclusion, this study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery and disease risk prediction using cardiac MRI data.


MY ALT TEXT

Associations between latent phenotypes and diseases (A) and continuous traits (B): volcano plot of effect sizes and significance

MY ALT TEXT

GWAS results for 182 DiffAE latent phenotypes. A: Manhattan plot summarising genome-wide associations across 182 latent phenotypes. B: Dotplot showing the categories of traits previously associated with our lead variants, according to the GWAS Catalog

Latent Manipulation

To interpret the semantic concept encoded in each latent phenotype and visualise how it affects cardiac structure, we employed latent space arithmetic techniques based on manipulation in the embedding space. We trained sparse linear and logistic regression models with L1 penalties, targeting multiple cardiac traits and diseases. By utilising the non-zero coefficients learned by these sparse linear models, we selected, via shrinkage, the specific latent phenotypes responsible for encoding a given cardiac trait or outcome. Therefore, by manipulating these specific subsets of latent phenotypes and decoding manipulated reconstructions of z, we can directly interpret on the scan what those latent phenotypes are capturing in relation to the heart's structure and function. We performed latent manipulation for a certain target phenotype and latent factor z = (zsem, xT) following the formula: z̃semnorm = zsemnorm + m √D β, where m denotes the manipulation magnitude, D the dimension of zsem (128 in our case), β the normalised regression coefficients for the considered target (possibly conditioned); the superscript norm indicates standard scaled vectors. The manipulated latent variable, z̃ = (z̃sem, xT), was decoded, and the reconstruction was visually inspected. We repeated the latent manipulation process for different values of m, z, and targets.

BibTeX


          @article{Ometto2024.11.04.24316700,
            author       = {Ometto, Sara and Chatterjee, Soumick and Vergani, Andrea Mario and 
                            Landini, Arianna and Sharapov, Sodbo and Giacopuzzi, Edoardo and 
                            Visconti, Alessia and Bianchi, Emanuele and Santonastaso, Federica and 
                            Soda, Emanuel M and Cisternino, Francesco and Pivato, Carlo Andrea and Ieva, Francesca and 
                            Di Angelantonio, Emanuele and Pirastu, Nicola and Glastonbury, Craig A},
            title        = {Hundreds of cardiac MRI traits derived using 3D diffusion autoencoders share a common genetic architecture},
            elocation-id = {2024.11.04.24316700},
            year         = {2024},
            doi          = {10.1101/2024.11.04.24316700},
            publisher    = {Cold Spring Harbor Laboratory Press},
            url          = {https://www.medrxiv.org/content/10.1101/2024.11.04.24316700},
            journal      = {medRxiv}
          }