Xiangkun He

and 6 more

Ensuring safety and achieving human-level driving performance remain challenges for autonomous vehicles, especially in safety-critical situations. As a key component of artificial intelligence, reinforcement learning is promising and has shown great potential in many complex tasks; however, its lack of safety guarantees limits its real-world applicability. Hence, further advancing reinforcement learning, especially from the safety perspective, is of great importance for autonomous driving. As revealed by cognitive neuroscientists, the amygdala of the brain can elicit defensive responses against threats or hazards, which is crucial for survival in and adaptation to risky environments. Drawing inspiration from this scientific discovery, we present a fear-neuro-inspired reinforcement learning framework to realize safe autonomous driving through modeling the amygdala functionality. This new technique facilitates an agent to learn defensive behaviors and achieve safe decision making with fewer safety violations. Through experimental tests, we show that the proposed approach enables the autonomous driving agent to attain state-of-the-art performance compared to the baseline agents and perform comparably to 30 certified human drivers, across various safety-critical scenarios. The results demonstrate the feasibility and effectiveness of our framework while also shedding light on the crucial role of simulating the amygdala function in the application of reinforcement learning to safety-critical autonomous driving domains.

Xiangkun He

and 1 more

With deep neural networks based function approximators, reinforcement learning holds the promise of learning complex end-to-end robotic controllers that can map high-dimensional sensory information directly to control policies. However, a common challenge, especially for robotics, is sample-efficient learning from sparse rewards, in which an agent is required to find a long sequence of “correct” actions to achieve a desired outcome. Unfortunately, inevitable perturbations on observations may make this task trickier to solve. Here, this paper advances a novel robust goal-conditioned reinforcement learning approach for end-to-end robotic control in adversarial and sparse reward environments. Specifically, a mixed adversarial attack scheme is presented to generate diverse adversarial perturbations on observations by combining white-box and black-box attacks. Meanwhile, a hindsight experience replay technique considering observation perturbations is developed to turn a failed experience into a successful one and generate the policy trajectories perturbed by the mixed adversarial attacks. Additionally, a robust goal-conditioned actor-critic method is proposed to learn goal-conditioned policies and keep the variations of the perturbed policy trajectories within bounds. Finally, the proposed method is evaluated on three tasks with adversarial attacks and sparse reward settings. The results indicate that our scheme can ensure robotic control performance and policy robustness on the adversarial and sparse reward tasks.

Xiangkun He

and 3 more

Reinforcement learning has demonstrated its potential in a series of challenging domains. However, many real-world decision making tasks involve unpredictable environmental changes or unavoidable perception errors that are often enough to mislead an agent into making suboptimal decisions and even cause catastrophic failures. In light of these potential risks, reinforcement learning with application in safety-critical autonomous driving domain remains tricky without ensuring robustness against environmental uncertainties (e.g., road adhesion changes or measurement noises). Therefore, this paper proposes a novel constrained adversarial reinforcement learning approach for robust decision making of autonomous vehicles at highway on-ramps. Environmental disturbance is modelled as an adversarial agent that can learn an optimal adversarial policy to thwart the autonomous driving agent. Meanwhile, observation perturbation is approximated to maximize the variation of the perturbed policy through a white-box adversarial attack technique. Furthermore, a constrained adversarial actor-critic algorithm is presented to optimize an on-ramp merging policy while keeping the variations of the attacked driving policy and action-value function within bounds. Finally, the proposed robust highway on-ramp merging decision making method of autonomous vehicles is evaluated in three stochastic mixed traffic flows with different densities, and its effectiveness is demonstrated in comparison with the competitive baselines.

Xiangkun He

and 3 more

Reinforcement learning holds the promise of allowing autonomous vehicles to learn complex decision making behaviors through interacting with other traffic participants. However, many real-world driving tasks involve unpredictable perception errors or measurement noises which may mislead an autonomous vehicle into making unsafe decisions, even cause catastrophic failures. In light of these risks, to ensure safety under perception uncertainty, autonomous vehicles are required to be able to cope with the worst case observation perturbations. Therefore, this paper proposes a novel observation adversarial reinforcement learning approach for robust lane change decision making of autonomous vehicles. A constrained observation-robust Markov decision process is presented to model lane change decision making behaviors of autonomous vehicles under policy constraints and observation uncertainties.  Meanwhile, a black-box attack technique based on Bayesian optimization is implemented to approximate the optimal adversarial observation perturbations efficiently. Furthermore, a constrained observation-robust actor-critic algorithm is advanced to optimize autonomous driving lane change policies while keeping the variations of the policies attacked by the optimal adversarial observation perturbations within bounds. Finally, the robust lane change decision making approach is evaluated in three stochastic mixed traffic flows based on different densities. The results demonstrate that the proposed method can not only enhance the performance of an autonomous vehicle but also improve the robustness of lane change policies against adversarial observation perturbations.