Implementation Details of Cartoon-VAE-Diffusion

1 minute read

Published: June 30, 2025

Variational Autoencoder (VAE) learns to compress data into a latent Gaussian space and reconstruct it in a single shot. Denoising Diffusion Probabilistic Model (DDPM) tackles the same evidence-lower-bound objective from another direction: it begins with pure noise and iteratively denoise through hundreds of steps, exchanging speed for high-fidelity, stable synthesis. Both frameworks connect random noise to data, yet VAE rely on an explicit encoder–decoder pair, whereas DDPM use a learned Markov chain that inverts a forward noising process. This blog traces the progression from VAE to DDPM, clarifying their shared principles, with code examples available at this repository.

Problem Formulation for Image Generation Model

We posit that every image \(x \in \mathbb{R}^n\) is generated by first sampling a low-dimensional latent \(z \in \mathbb{R}^d\) and then “decoding” it into pixel-space:

Prior on latents:
\[P_\theta(z)\]
(usually \(\mathcal{N}(0,I)\), no dependence on \(x\)).
Decoder likelihood (a Gaussian around a neural-net mean):
\[P_\theta(x \mid z) \;=\; \mathcal{N}\bigl(x;\;G_\theta(z),\;\sigma^2 I\bigr),\]
where \(G_\theta(z)\) is the deterministic network output (the mean).
From these we get the joint,
\[P_\theta(x,z) \;=\; P_\theta(z)\,P_\theta(x\mid z),\]
and the marginal (aka evidence),
\[P_\theta(x) \;=\; \int P_\theta(x,z)\,dz \;=\; \int P_\theta(z)\,P_\theta(x\mid z)\,dz.\]
- \(P_\theta(z)\): “How we expect latents to be distributed, before seeing any image.”
- \(P_\theta(x\mid z)\): “If the latent was \(z\), how likely is image \(x\)?”
- \(P_\theta(x,z)\): joint chance of drawing \(z\) then generating \(x\).
- \(P_\theta(x)\): overall likelihood of observing image \(x\) under our model.

Lihan Lian

Implementation Details of Cartoon-VAE-Diffusion

Problem Formulation for Image Generation Model

Variational Autoencoders (VAE)

Evidence Lower Bound (ELBO)

Final Objective Function

Code Example

Diffusion Models

DDPM

Results

References

You May Also Enjoy

Taxonomy of Robotic Manipulation: Inverse Kinematics (IK), Operational Space Control (OSC), Impedance Control, Riemannian Motion Policy (RMP) and Geometric Fabrics Permalink

Shed Some Light on Proximal Policy Optimization (PPO) and Its Application Permalink

From Q-Learning to Deep Q-Learning and Deep Deterministic Policy Gradient (DDPG) Permalink

Dwell on Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR) Permalink