Posts by Tags

From Q-Learning to Deep Q-Learning and Deep Deterministic Policy Gradient (DDPG) Permalink

16 minute read

Published: March 10, 2025

Q-learning, an off-policy reinforcement learning algorithm, uses the Bellman equation to iteratively update state-action values, helping an agent determine the best actions to maximize cumulative rewards. Deep Q-learning improves upon Q-learning by leveraging deep Q network (DQN) to approximate Q-values, enabling it to handle continuous state spaces but it is still only suitable for discrete action spaces. Further advancement, Deep Deterministic Policy Gradient (DDPG), combines Q-learning’s principles with policy gradients, making it also suitable for continuous action spaces. This blog starts by discussing the basic components of reinforcement learning and gradually explore how Q-learning evolves into DQN and DDPG, with application for solving the cartpole environment in Isaac Gym simulator. Corresponding code can be found at this repository.

From Control Hamiltonian to Algebraic Riccati Equation and Pontryagin’s Maximum Principle Permalink

30 minute read

Published: July 31, 2024

Inspired by the Hamiltonian of classical mechanics, Lev Pontryagin introduced the Control Hamiltonian and formulated his celebrated Pontragin’s Maximum Principle (PMP). This blog will first discuss general cases of optimal control problems, including scenarios with both free final time and final states. Then, the derivation of Algebraic Riccati Equation (ARE) in the context of Continuous Linear Qudratic Regulator (LQR) from the perspective of the Control Hamiltonian will be introduced. It will then explain the PMP, finally followed by the example problem of Constrained Continous LQR.

From Control Hamiltonian to Algebraic Riccati Equation and Pontryagin’s Maximum Principle Permalink

30 minute read

Published: July 31, 2024

Inspired by the Hamiltonian of classical mechanics, Lev Pontryagin introduced the Control Hamiltonian and formulated his celebrated Pontragin’s Maximum Principle (PMP). This blog will first discuss general cases of optimal control problems, including scenarios with both free final time and final states. Then, the derivation of Algebraic Riccati Equation (ARE) in the context of Continuous Linear Qudratic Regulator (LQR) from the perspective of the Control Hamiltonian will be introduced. It will then explain the PMP, finally followed by the example problem of Constrained Continous LQR.

On Derivation of Hamilton-Jacobi-Bellman Equation and Its Application Permalink

13 minute read

Published: August 15, 2024

The Hamilton-Jacobi-Bellman (HJB) equation is arguably one of the most important cornerstones of optimal control theory and reinforcement learning. In this blog, we will first introduce the Hamilton-Jacobi Equation and Bellman’s Principle of Optimality. We will then delve into the derivation of the HJB equation. Finally, we will conclude with an example that shows the derivation of the famous Algebraic Riccati equation (ARE) from this perspective.

Implementation Details of Cartoon-VAE-Diffusion Permalink

1 minute read

Published: June 30, 2025

Variational Autoencoder (VAE) learns to compress data into a latent Gaussian space and reconstruct it in a single shot. Denoising Diffusion Probabilistic Model (DDPM) tackles the same evidence-lower-bound objective from another direction: it begins with pure noise and iteratively denoise through hundreds of steps, exchanging speed for high-fidelity, stable synthesis. Both frameworks connect random noise to data, yet VAE rely on an explicit encoder–decoder pair, whereas DDPM use a learned Markov chain that inverts a forward noising process. This blog traces the progression from VAE to DDPM, clarifying their shared principles, with code examples available at this repository.

Implementation Details of Basic-and-Fast-Neural-Style-Transfer Permalink

7 minute read

Published: September 01, 2024

Neural style transfer (NST) serves as an essential starting point those interested in deep learning, incorporating critical components or techniques such as convolutional neural networks (CNNs), VGG network, residual networks, upsampling and normalization. The basic neural style transfer method captures and manipulates image features using CNNs and the VGG network directly over the input image. In contrast, the fast neural style transfer method employs a dataset to train an inference neural network that can be used for real-time style transfer. This blogs provides the explanation of the implementation of both methods. Corresponding repository can be found here.

From Q-Learning to Deep Q-Learning and Deep Deterministic Policy Gradient (DDPG) Permalink

16 minute read

Published: March 10, 2025

Q-learning, an off-policy reinforcement learning algorithm, uses the Bellman equation to iteratively update state-action values, helping an agent determine the best actions to maximize cumulative rewards. Deep Q-learning improves upon Q-learning by leveraging deep Q network (DQN) to approximate Q-values, enabling it to handle continuous state spaces but it is still only suitable for discrete action spaces. Further advancement, Deep Deterministic Policy Gradient (DDPG), combines Q-learning’s principles with policy gradients, making it also suitable for continuous action spaces. This blog starts by discussing the basic components of reinforcement learning and gradually explore how Q-learning evolves into DQN and DDPG, with application for solving the cartpole environment in Isaac Gym simulator. Corresponding code can be found at this repository.

Implementation Details of Basic-and-Fast-Neural-Style-Transfer Permalink

7 minute read

Published: September 01, 2024

Neural style transfer (NST) serves as an essential starting point those interested in deep learning, incorporating critical components or techniques such as convolutional neural networks (CNNs), VGG network, residual networks, upsampling and normalization. The basic neural style transfer method captures and manipulates image features using CNNs and the VGG network directly over the input image. In contrast, the fast neural style transfer method employs a dataset to train an inference neural network that can be used for real-time style transfer. This blogs provides the explanation of the implementation of both methods. Corresponding repository can be found here.

Taxonomy of Robotic Manipulation: Inverse Kinematics (IK), Operational Space Control (OSC), Impedance Control, Riemannian Motion Policy (RMP) and Geometric Fabrics Permalink

less than 1 minute read

Published: August 30, 2025

Inverse Kinematics (IK) converts a desired end-effector pose into joint angles and is fundamental for basic motion planning. Operational Space Control (OSC) layers dynamics on top of IK, commanding Cartesian forces/velocities so the robot can track trajectories while rejecting disturbances. Impedance Control then shapes the robot’s apparent mass-spring-damper behavior, enabling compliant contact and human-robot interaction. Riemannian Motion Policy (RMP) generalizes impedance ideas across multiple task maps, blending goal-reaching, joint limits, and obstacle avoidance on curved manifolds. Finally, Geometric Fabrics build on RMP to sculpt global energy landscapes, yielding provably safe navigation around complex environments. This post aims to provide brief explanation for each concepts.

Implementation Details of Cartoon-VAE-Diffusion Permalink

1 minute read

Published: June 30, 2025

Variational Autoencoder (VAE) learns to compress data into a latent Gaussian space and reconstruct it in a single shot. Denoising Diffusion Probabilistic Model (DDPM) tackles the same evidence-lower-bound objective from another direction: it begins with pure noise and iteratively denoise through hundreds of steps, exchanging speed for high-fidelity, stable synthesis. Both frameworks connect random noise to data, yet VAE rely on an explicit encoder–decoder pair, whereas DDPM use a learned Markov chain that inverts a forward noising process. This blog traces the progression from VAE to DDPM, clarifying their shared principles, with code examples available at this repository.

Dwell on Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR) Permalink

6 minute read

Published: October 10, 2024

Although optimal control and reinforcement learning appear to be distinct field, they are, in fact closely related. Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR), two powerful algorithms commonly utilized in trajectory optimizations, exemplify how model-based reinforcement learning can bridge the gap between these domains. This blog begins by discussing the fundational principles, including Newton’s method and Bellman Equation. It then delves into the specifics of the DDP and iLQR algorithms, illustrating their application through the classical problem of double pendulum swing-up control.

On Derivation of Hamilton-Jacobi-Bellman Equation and Its Application Permalink

13 minute read

Published: August 15, 2024

The Hamilton-Jacobi-Bellman (HJB) equation is arguably one of the most important cornerstones of optimal control theory and reinforcement learning. In this blog, we will first introduce the Hamilton-Jacobi Equation and Bellman’s Principle of Optimality. We will then delve into the derivation of the HJB equation. Finally, we will conclude with an example that shows the derivation of the famous Algebraic Riccati equation (ARE) from this perspective.

On Derivation of Euluer-Lagrange Equation and Its Application Permalink

11 minute read

Published: June 30, 2024

Euler-Lagrange equation plays an essential role in calculus of variations and classical mechanics. Beyond its applications in deriving equations of motion, Euler-Lagrange equation is ubiquitous in the filed of trajectory optimization, serving as a critical stepping stone for many powerful optimal control techniques, such as Pontrygain’s Maximum Principle. In this blog, derivation of the Euler-Lagrange equation and two simple cases of its application are introduced.

On Derivation of Euluer-Lagrange Equation and Its Application Permalink

11 minute read

Published: June 30, 2024

Euler-Lagrange equation plays an essential role in calculus of variations and classical mechanics. Beyond its applications in deriving equations of motion, Euler-Lagrange equation is ubiquitous in the filed of trajectory optimization, serving as a critical stepping stone for many powerful optimal control techniques, such as Pontrygain’s Maximum Principle. In this blog, derivation of the Euler-Lagrange equation and two simple cases of its application are introduced.

Implementation Details of Cartoon-VAE-Diffusion Permalink

1 minute read

Published: June 30, 2025

Variational Autoencoder (VAE) learns to compress data into a latent Gaussian space and reconstruct it in a single shot. Denoising Diffusion Probabilistic Model (DDPM) tackles the same evidence-lower-bound objective from another direction: it begins with pure noise and iteratively denoise through hundreds of steps, exchanging speed for high-fidelity, stable synthesis. Both frameworks connect random noise to data, yet VAE rely on an explicit encoder–decoder pair, whereas DDPM use a learned Markov chain that inverts a forward noising process. This blog traces the progression from VAE to DDPM, clarifying their shared principles, with code examples available at this repository.

On Derivation of Euluer-Lagrange Equation and Its Application Permalink

11 minute read

Published: June 30, 2024

Euler-Lagrange equation plays an essential role in calculus of variations and classical mechanics. Beyond its applications in deriving equations of motion, Euler-Lagrange equation is ubiquitous in the filed of trajectory optimization, serving as a critical stepping stone for many powerful optimal control techniques, such as Pontrygain’s Maximum Principle. In this blog, derivation of the Euler-Lagrange equation and two simple cases of its application are introduced.

From Control Hamiltonian to Algebraic Riccati Equation and Pontryagin’s Maximum Principle Permalink

30 minute read

Published: July 31, 2024

Inspired by the Hamiltonian of classical mechanics, Lev Pontryagin introduced the Control Hamiltonian and formulated his celebrated Pontragin’s Maximum Principle (PMP). This blog will first discuss general cases of optimal control problems, including scenarios with both free final time and final states. Then, the derivation of Algebraic Riccati Equation (ARE) in the context of Continuous Linear Qudratic Regulator (LQR) from the perspective of the Control Hamiltonian will be introduced. It will then explain the PMP, finally followed by the example problem of Constrained Continous LQR.

Taxonomy of Robotic Manipulation: Inverse Kinematics (IK), Operational Space Control (OSC), Impedance Control, Riemannian Motion Policy (RMP) and Geometric Fabrics Permalink

less than 1 minute read

Published: August 30, 2025

Inverse Kinematics (IK) converts a desired end-effector pose into joint angles and is fundamental for basic motion planning. Operational Space Control (OSC) layers dynamics on top of IK, commanding Cartesian forces/velocities so the robot can track trajectories while rejecting disturbances. Impedance Control then shapes the robot’s apparent mass-spring-damper behavior, enabling compliant contact and human-robot interaction. Riemannian Motion Policy (RMP) generalizes impedance ideas across multiple task maps, blending goal-reaching, joint limits, and obstacle avoidance on curved manifolds. Finally, Geometric Fabrics build on RMP to sculpt global energy landscapes, yielding provably safe navigation around complex environments. This post aims to provide brief explanation for each concepts.

Implementation Details of Basic-and-Fast-Neural-Style-Transfer Permalink

7 minute read

Published: September 01, 2024

Neural style transfer (NST) serves as an essential starting point those interested in deep learning, incorporating critical components or techniques such as convolutional neural networks (CNNs), VGG network, residual networks, upsampling and normalization. The basic neural style transfer method captures and manipulates image features using CNNs and the VGG network directly over the input image. In contrast, the fast neural style transfer method employs a dataset to train an inference neural network that can be used for real-time style transfer. This blogs provides the explanation of the implementation of both methods. Corresponding repository can be found here.

On Derivation of Hamilton-Jacobi-Bellman Equation and Its Application Permalink

13 minute read

Published: August 15, 2024

The Hamilton-Jacobi-Bellman (HJB) equation is arguably one of the most important cornerstones of optimal control theory and reinforcement learning. In this blog, we will first introduce the Hamilton-Jacobi Equation and Bellman’s Principle of Optimality. We will then delve into the derivation of the HJB equation. Finally, we will conclude with an example that shows the derivation of the famous Algebraic Riccati equation (ARE) from this perspective.

Shed Some Light on Proximal Policy Optimization (PPO) and Its Application Permalink

15 minute read

Published: May 31, 2025

Proximal Policy Optimization (PPO) is a reinforcement learning algorithm that refines policy gradient methods like REINFORCE using importance sampling and a clipped surrogate objective to stabilize updates. PPO-Penalty explicitly penalizes KL divergence in the objective function, and PPO-Clip instead uses clipping to prevent large policy updates. In many robotics tasks, PPO is first used to train a base policy (potentially with privileged information). Then, a deployable controller is learned from this base policy using imitation learning, distillation, or other techniques. This blog explores PPO’s core principle, with code available at this repository.

Shed Some Light on Proximal Policy Optimization (PPO) and Its Application Permalink

15 minute read

Published: May 31, 2025

Proximal Policy Optimization (PPO) is a reinforcement learning algorithm that refines policy gradient methods like REINFORCE using importance sampling and a clipped surrogate objective to stabilize updates. PPO-Penalty explicitly penalizes KL divergence in the objective function, and PPO-Clip instead uses clipping to prevent large policy updates. In many robotics tasks, PPO is first used to train a base policy (potentially with privileged information). Then, a deployable controller is learned from this base policy using imitation learning, distillation, or other techniques. This blog explores PPO’s core principle, with code available at this repository.

Shed Some Light on Proximal Policy Optimization (PPO) and Its Application Permalink

15 minute read

Published: May 31, 2025

Proximal Policy Optimization (PPO) is a reinforcement learning algorithm that refines policy gradient methods like REINFORCE using importance sampling and a clipped surrogate objective to stabilize updates. PPO-Penalty explicitly penalizes KL divergence in the objective function, and PPO-Clip instead uses clipping to prevent large policy updates. In many robotics tasks, PPO is first used to train a base policy (potentially with privileged information). Then, a deployable controller is learned from this base policy using imitation learning, distillation, or other techniques. This blog explores PPO’s core principle, with code available at this repository.

From Q-Learning to Deep Q-Learning and Deep Deterministic Policy Gradient (DDPG) Permalink

16 minute read

Published: March 10, 2025

Q-learning, an off-policy reinforcement learning algorithm, uses the Bellman equation to iteratively update state-action values, helping an agent determine the best actions to maximize cumulative rewards. Deep Q-learning improves upon Q-learning by leveraging deep Q network (DQN) to approximate Q-values, enabling it to handle continuous state spaces but it is still only suitable for discrete action spaces. Further advancement, Deep Deterministic Policy Gradient (DDPG), combines Q-learning’s principles with policy gradients, making it also suitable for continuous action spaces. This blog starts by discussing the basic components of reinforcement learning and gradually explore how Q-learning evolves into DQN and DDPG, with application for solving the cartpole environment in Isaac Gym simulator. Corresponding code can be found at this repository.

Dwell on Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR) Permalink

6 minute read

Published: October 10, 2024

Although optimal control and reinforcement learning appear to be distinct field, they are, in fact closely related. Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR), two powerful algorithms commonly utilized in trajectory optimizations, exemplify how model-based reinforcement learning can bridge the gap between these domains. This blog begins by discussing the fundational principles, including Newton’s method and Bellman Equation. It then delves into the specifics of the DDP and iLQR algorithms, illustrating their application through the classical problem of double pendulum swing-up control.

On Derivation of Hamilton-Jacobi-Bellman Equation and Its Application Permalink

13 minute read

Published: August 15, 2024

The Hamilton-Jacobi-Bellman (HJB) equation is arguably one of the most important cornerstones of optimal control theory and reinforcement learning. In this blog, we will first introduce the Hamilton-Jacobi Equation and Bellman’s Principle of Optimality. We will then delve into the derivation of the HJB equation. Finally, we will conclude with an example that shows the derivation of the famous Algebraic Riccati equation (ARE) from this perspective.

Taxonomy of Robotic Manipulation: Inverse Kinematics (IK), Operational Space Control (OSC), Impedance Control, Riemannian Motion Policy (RMP) and Geometric Fabrics Permalink

less than 1 minute read

Published: August 30, 2025

Inverse Kinematics (IK) converts a desired end-effector pose into joint angles and is fundamental for basic motion planning. Operational Space Control (OSC) layers dynamics on top of IK, commanding Cartesian forces/velocities so the robot can track trajectories while rejecting disturbances. Impedance Control then shapes the robot’s apparent mass-spring-damper behavior, enabling compliant contact and human-robot interaction. Riemannian Motion Policy (RMP) generalizes impedance ideas across multiple task maps, blending goal-reaching, joint limits, and obstacle avoidance on curved manifolds. Finally, Geometric Fabrics build on RMP to sculpt global energy landscapes, yielding provably safe navigation around complex environments. This post aims to provide brief explanation for each concepts.

Dwell on Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR) Permalink

6 minute read

Published: October 10, 2024

Although optimal control and reinforcement learning appear to be distinct field, they are, in fact closely related. Differential Dynamic Programming (DDP) and Iterative Linear Quadratic Regulator (iLQR), two powerful algorithms commonly utilized in trajectory optimizations, exemplify how model-based reinforcement learning can bridge the gap between these domains. This blog begins by discussing the fundational principles, including Newton’s method and Bellman Equation. It then delves into the specifics of the DDP and iLQR algorithms, illustrating their application through the classical problem of double pendulum swing-up control.

Lihan Lian

Posts by Tags

(Deep) Q-learning

Algebraic Riccati Equation

Calculus of Variations

Classical Mechanics

Computer Vision

Deep Deterministic Policy Gradient

Deep Learning

Differential Geometry

Diffusion Models

Dynamic Programming

Euler-Largrange Equation

Functional Analysis

Generative Models

Lagrangian Mechanics

Linear Qudratic Regulator

Motion Planning

Neural Style Transfer

Optimal Control

Policy Gradient

Proximal Policy Optimization

Reinforcement Learning

Robotic Manipulation

Trajectory Optimization