Biobank-scale imaging provides an unprecedented opportunity to characterise thousands of organ phenotypes, how they vary in populations and how they relate to disease outcomes. However, deriving specific phenotypes from imaging data, such as Magnetic Resonance Imaging (MRI), requires time-consuming expert annotation, limiting scalability, and does not exploit how information-dense such image acquisitions are. In this study, we developed a 3D diffusion autoencoder to derive latent phenotypes from temporally resolved cardiac MRI data of 71,021 UK Biobank participants. These phenotypes were reproducible, heritable (h2 = [4 - 18%]), and significantly associated with cardiometabolic traits and outcomes, including atrial fibrillation (P = 8.5 × 10-29) and myocardial infarction (P = 3.7 × 10-12). By using latent space manipulation techniques, we were able to learn, directly interpret and visualise what specific latent phenotypes are capturing in a given MRI. To establish the genetic basis of such traits, we performed a genome-wide association study, identifying 89 significant common variants (P < 2.3 × 10-9) across 42 loci, including seven novel loci. Extensive multi-trait colocalisation analyses (PP.H4 > 0.8) linked variants across phenotypic scales, from intermediate cardiac traits to cardiac disease endpoints. For example, rs142556838 that falls in CCDC141 colocalises with a latent imaging phenotype and a diastolic blood pressure locus. Using single-cell RNA-sequencing data we map CCDC141 expression specifically to a population of ventricular cardiomyocytes. Finally, Polygenic Risk Scores (PRS) derived from latent phenotypes demonstrated predictive power for a range of cardiometabolic diseases and enabled us to successfully stratify the individuals into different risk groups. In conclusion, this study showcases the use of diffusion autoencoding methods as powerful tools for unsupervised phenotyping, genetic discovery and disease risk prediction using cardiac MRI data.