Hundreds of cardiac MRI traits derived using 3D diffusion autoencoders share a common genetic architecture

¹Human Technopole, Milan, Italy
²Department of Mathematics, Politecnico di Milano, Italy
³Department of Biomedical Sciences, Humanitas University, Milan, Italy
⁴RCCS Humanitas Research Hospital, Milan, Italy
Email: {sara.ometto, soumick.chatterjee, craig.glastonbury}@fht.org
Preprint is available on medRxiv
^*Indicates Equal Contribution

Abstract

Biobank-scale imaging provides an unprecedented opportunity to characterise thousands of organ phenotypes, how they vary in populations and how they relate to disease outcomes. However, deriving specific phenotypes from imaging data, such as Magnetic Resonance Imaging (MRI), requires time-consuming expert annotation, limiting scalability, and does not exploit how information-dense such image acquisitions are. In this study, we developed a 3D diffusion autoencoder to derive latent phenotypes from temporally resolved cardiac MRI data of 71,021 UK Biobank participants. These phenotypes were reproducible, heritable (h² = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (P = 8.5 × 10^-29) and myocardial infarction (P = 3.7 × 10^-12). By using latent space manipulation techniques, we were able to learn, directly interpret and visualise what specific latent phenotypes are capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (P < 2.3 × 10^-9) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H₄ > 0.8) linked variants across phenotypic scales, from intermediate cardiac traits to cardiac disease endpoints. For example, rs142556838 that falls in CCDC141 colocalises with a latent imaging phenotype and a diastolic blood pressure locus. Using single-cell RNA-sequencing data we map CCDC141 expression specifically to a population of ventricular cardiomyocytes. Finally, Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases and enabled us to successfully stratify the individuals into different risk groups. In conclusion, this study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery and disease risk prediction using cardiac MRI data.

Latent Manipulation

To interpret the semantic concept encoded in each latent phenotype and visualise how it affects cardiac structure, we employed latent space arithmetic techniques based on manipulation in the embedding space. We trained sparse linear and logistic regression models with L1 penalties, targeting multiple cardiac traits and diseases. By utilising the non-zero coefficients learned by these sparse linear models, we selected, via shrinkage, the specific latent phenotypes responsible for encoding a given cardiac trait or outcome. Therefore, by manipulating these specific subsets of latent phenotypes and decoding manipulated reconstructions of z, we can directly interpret on the scan what those latent phenotypes are capturing in relation to the heart's structure and function. We performed latent manipulation for a certain target phenotype and latent factor z = (z_sem, x_T) following the formula: z̃_sem^norm = z_sem^norm + m √D β, where m denotes the manipulation magnitude, D the dimension of z_sem (128 in our case), β the normalised regression coefficients for the considered target (possibly conditioned); the superscript norm indicates standard scaled vectors. The manipulated latent variable, z̃ = (z̃_sem, x_T), was decoded, and the reconstruction was visually inspected. We repeated the latent manipulation process for different values of m, z, and targets.

Left ventricle end-diastolic volume manipulation: Subject 1

Left ventricle myocardial wall thickness manipulation: Subject 1

Left ventricle end-diastolic volume manipulation: Subject 2

Left ventricle myocardial wall thickness manipulation: Subject 2

Left ventricle end-diastolic volume manipulation: Subject 3

Left ventricle myocardial wall thickness manipulation: Subject 3

Left ventricle end-diastolic volume manipulation: Subject 4

Left ventricle myocardial wall thickness manipulation: Subject 4

Left ventricle end-diastolic volume manipulation: Subject 5

Left ventricle myocardial wall thickness manipulation: Subject 5

BibTeX

@article{Ometto2024.11.04.24316700, author = {Ometto, Sara and Chatterjee, Soumick and Vergani, Andrea Mario and Landini, Arianna and Sharapov, Sodbo and Giacopuzzi, Edoardo and Visconti, Alessia and Bianchi, Emanuele and Santonastaso, Federica and Soda, Emanuel M and Cisternino, Francesco and Pivato, Carlo Andrea and Ieva, Francesca and Di Angelantonio, Emanuele and Pirastu, Nicola and Glastonbury, Craig A}, title = {Hundreds of cardiac MRI traits derived using 3D diffusion autoencoders share a common genetic architecture}, elocation-id = {2024.11.04.24316700}, year = {2024}, doi = {10.1101/2024.11.04.24316700}, publisher = {Cold Spring Harbor Laboratory Press}, url = {https://www.medrxiv.org/content/10.1101/2024.11.04.24316700}, journal = {medRxiv} }