The paper demonstrates high-quality image synthesis with diffusion probabilistic models and uncovers a fundamental link between the forward-backward diffusion process, denoising score matching, and annealed Langevin dynamics. By adopting a weighted variational bound and an epsilon-prediction parameterization, the authors achieve state-of-the-art or competitive results on CIFAR-10 and LSUN, while revealing a progressive decoding perspective that resembles autoregressive decoding and enables a lossy compression interpretation. They introduce a simplified training objective that improves sample quality and analyze the rate-distortion behavior and progressive sampling, showing that large-scale structure emerges early in generation. The work positions diffusion models as a versatile, likelihood-evaluable framework with broad implications for compression, interpolation, and multi-modal generation, and provides a practical, open-source implementation.
Abstract
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion