AUTHOREA
Log in Sign Up Browse Preprints
LOG IN SIGN UP
Andrew Thomas
Andrew Thomas

Public Documents 2
Multi-Agent Reinforcement Learning for Cyber Defence Transferability and Scalability
Andrew Thomas
Matthew Yates

Andrew Thomas

and 2 more

March 17, 2025
Reinforcement learning (RL) has shown to be effective for simple automated cyber defence (ACD) type tasks. However, there are limitations to these approaches that prevent them from being deployed onto real-world hardware. Trained policies will often have limited transferability across even small changes to the environment setup. Instability during training can prevent optimal learning, a problem that only increases as the environment scales and grows in complexity. In this work we look at addressing these limitations with a zero-shot transfer approach based on multi-agent reinforcement learning. Our approach partitions up the task into smaller network machine subtasks, where agents learn the solution to the local problem. These local agents are trained in a small-scale network, then transferred to larger networks by mapping the agents to machines in the new network. We have found that this transfer method is effective for direct application to a number of ACD tasks. We show that its performance is robust to changes in network activity, attack scenario and reduces the effects of network scale on performance.
Meta Reinforcement Learning for Automated Cyber Defence
Andrew Thomas
Nick Tillyer

Andrew Thomas

and 1 more

April 07, 2025
Reinforcement learning (RL) solutions have shown considerable promise for automating the defence of networks to cyber attacks. However, a limitation to their real world deployment is the sample efficiency and inflexibility of RL agents. This means that even small changes to attack types requires a new agent to be trained from scratch. Meta-learning for RL aims to improve the sample efficiency of training agents by encoding pre-training information that assists fast adaptation. This work focusses on two key meta-learning approaches, MAML and ML3, representing differing approaches to encoding meta learning knowledge. Both approaches are limited to sets of environments that use the same action and observation space. To overcome this we also present an extension to ML3, Gen ML3 that removes this requirement by training the learned loss on the reward information only. Experiments have been conducted on a distribution of network setups based on the PrimAITE environment. All approaches demonstrated improvements in sample efficiency against a PPO baseline for a range of automated cyber defence (ACD) tasks. We also show effective meta-learning across network topologies with Gen ML3.

| Powered by Authorea.com

  • Home