Implementation Details of Cartoon-VAE-Diffusion

1 minute read

Published:

Variational Autoencoder (VAE) learns to compress data into a latent Gaussian space and reconstruct it in a single shot. Denoising Diffusion Probabilistic Model (DDPM) tackles the same evidence-lower-bound objective from another direction: it begins with pure noise and iteratively denoise through hundreds of steps, exchanging speed for high-fidelity, stable synthesis. Both frameworks connect random noise to data, yet VAE rely on an explicit encoder–decoder pair, whereas DDPM use a learned Markov chain that inverts a forward noising process. This blog traces the progression from VAE to DDPM, clarifying their shared principles, with code examples available at this repository.

Problem Formulation for Image Generation Model

We posit that every image \(x \in \mathbb{R}^n\) is generated by first sampling a low-dimensional latent \(z \in \mathbb{R}^d\) and then “decoding” it into pixel-space:

  • Prior on latents:

    \[P_\theta(z)\]

    (usually \(\mathcal{N}(0,I)\), no dependence on \(x\)).

  • Decoder likelihood (a Gaussian around a neural-net mean):

    \[P_\theta(x \mid z) \;=\; \mathcal{N}\bigl(x;\;G_\theta(z),\;\sigma^2 I\bigr),\]

    where \(G_\theta(z)\) is the deterministic network output (the mean).

    From these we get the joint,

    \[P_\theta(x,z) \;=\; P_\theta(z)\,P_\theta(x\mid z),\]

    and the marginal (aka evidence),

    \[P_\theta(x) \;=\; \int P_\theta(x,z)\,dz \;=\; \int P_\theta(z)\,P_\theta(x\mid z)\,dz.\]
    • \(P_\theta(z)\): “How we expect latents to be distributed, before seeing any image.”
    • \(P_\theta(x\mid z)\): “If the latent was \(z\), how likely is image \(x\)?”
    • \(P_\theta(x,z)\): joint chance of drawing \(z\) then generating \(x\).
    • \(P_\theta(x)\): overall likelihood of observing image \(x\) under our model.

Variational Autoencoders (VAE)

  • Evidence Lower Bound (ELBO)

  • Final Objective Function

  • Code Example

    VAE sampling result. Trained with z_dim = 512, epochs = 100.

Diffusion Models

DDPM

  • Results

References

  1. Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
  2. Understanding Variational Autoencoders (VAEs)
  3. The Breakthrough Behind Modern AI Image Generators - Diffusion Models Part 1
  4. Denoising Diffusion Probabilistic Models (DDPM Paper)
  5. Denoising Diffusion Implicit Models (DDIM Paper)
  6. Diffusion Model Paper Explanation, PyTorch Implementation Walk Through and corresponding github repo
  7. diffusion-DDPM-pytorch & diffusion-DDIM-pytorch
  8. An Optimal Control Perspective on Diffusion-Based Generative Modeling & SDE/ODE Interpretation of Diffusion Model