AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Yongjin Lee
Yongjin Lee

Public Documents 1
On Analysis of Clipped Critic Loss in Proximal Policy Gradient
Yongjin Lee
Moonyoung Chung

Yongjin Lee

and 1 more

November 12, 2024
Proximal Policy Optimization (PPO) stands as one of the most successful deep reinforcement learning methods, primarily owing to its utilization of a clipped loss for an actor. While the clipped loss for an actor has been extensively studied, its counterpart for a critic has not received equal attention. This study provides a comprehensive analysis of the behavior of the clipped critic loss, revealing a misalignment with the trust region principle. Drawing on our analysis, we propose a refined variant that aligns closely with the trust region principle.

| Powered by Authorea.com

  • Home