Diffusion models, framed by Sohl-Dickstein et al. (2015) and made practical for high-quality image synthesis by Ho et al. (2020), work in two phases: a forward process that progressively adds noise to data, and a learned reverse process that denoises random noise back into a coherent sample.
They underpin today's leading text-to-image systems and are increasingly used for audio and video generation.