Chayan Banerjee -

Actor-critic (AC) algorithms are model-free deep reinforcement learning techniques that have consistently demon- strated their effectiveness across various domains, especially in addressing continuous control challenges. Enhancing exploration (action entropy) and exploitation (expected return) through more efficient sample utilization is pivotal in AC algorithms. The fundamental strategy of a learning algorithm is to intelligently navigate the environmentâ\euro™s state space, prioritizing the explo- ration of rarely visited states over frequently encountered ones. In alignment with this strategy, we propose an innovative approach to bolster exploration by employing an intrinsic reward based on a stateâ\euro™s novelty and the potential benefits of exploring that state, which we term plausible novelty. Our approach is designed for seamless integration into off-policy AC algorithms. Through incentivized exploration of plausibly novel states, an AC algo- rithm can substantially enhance its sample efficiency and overall training performance. The new approach is verified through extensive simulations across various continuous control tasks within MuJoCo environments, utilizing a range of prominent off-policy AC algorithmsIncentivizing Plausible Novel States: An Exploration Boosting Approach for Actor Critic Algorithms