Thomas CHAFFRE

and 5 more

Deep Reinforcement Learning (DRL) methods are dominating the field of adaptive control where they are used to adapt the controller response to disturbances. Nevertheless, the usage of these methods on physical platforms is still limited due to their data inefficiency and the performance drop when facing unseen process variations. This is particularly perceived in the Autonomous Underwater Vehicles (AUVs) context as studied here, where the process observability is limited. To be effective, DRL-based AUV control systems require the use of methods that are data-efficient (in order to reach a satisfactory behavior with a sufficiently fast response time) and are resilient (to ensure robustness to severe changes in operating conditions). With this ambition, we study in this paper the effect of the Experience Replay (ER) mechanism on the performance variation of a DRL-based stochastic adaptive controller. We propose a new ER method (denoted as BIER) that takes inspiration from the biological Replay Mechanism and compare it to the standard method denoted as CER. We apply it to the Soft Actor-Critic, a maximum entropy DRL algorithm, for use with an AUV maneuvering task that consists in stabilizing the vehicle at a given velocity and pose. The training results show that our BIER method exceeds the performance of the nonadaptive optimal model-based counterpart of the controller in less than half the number of episodes compared to CER. We proposed different evaluation scenarios of increasing complexity as measured by desired velocity value and amplitude of current disturbance. Our results suggest that the BIER method achieves improved learning stability and better generalization abilities.