Goto

Collaborating Authors

 independent rl


Large Language Model Integration with Reinforcement Learning to Augment Decision-Making in Autonomous Cyber Operations

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has shown great potential for autonomous decision-making in the cybersecurity domain, enabling agents to learn through direct environment interaction. However, RL agents in Autonomous Cyber Operations (ACO) typically learn from scratch, requiring them to execute undesirable actions to learn their consequences. In this study, we integrate external knowledge in the form of a Large Language Model (LLM) pretrained on cybersecurity data that our RL agent can directly leverage to make informed decisions. By guiding initial training with an LLM, we improve baseline performance and reduce the need for exploratory actions with obviously negative outcomes. We evaluate our LLM-integrated approach in a simulated cybersecurity environment, and demonstrate that our guided agent achieves over 2x higher rewards during early training and converges to a favorable policy approximately 4,500 episodes faster than the baseline.


A Comparative Evaluation of Teacher-Guided Reinforcement Learning Techniques for Autonomous Cyber Operations

arXiv.org Artificial Intelligence

Autonomous Cyber Operations (ACO) rely on Reinforcement Learning (RL) to train agents to make effective decisions in the cybersecurity domain. However, existing ACO applications require agents to learn from scratch, leading to slow convergence and poor early-stage performance. While teacher-guided techniques have demonstrated promise in other domains, they have not yet been applied to ACO. In this study, we implement four distinct teacher-guided techniques in the simulated CybORG environment and conduct a comparative evaluation. Our results demonstrate that teacher integration can significantly improve training efficiency in terms of early policy performance and convergence speed, highlighting its potential benefits for autonomous cybersecurity.


Reviews: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

Neural Information Processing Systems

Summary: "A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning" presents a novel scalable algorithm that is shown to converge to better behaviours in partially-observable Multi-Agent Reinforcement Learning scenarios compared to previous methods. The paper begins with describing the problem, mainly that training reinforcement learning agents independently (i.e. each agent ignores the behaviours of the other agents and treats them as part of the environment) results in policies which can significantly overfit to only the agent behaviours observed during training time, failing to generalize when later set against new opponent behaviours. The paper then describes its solution, a generalization of the Double Oracle algorithm. The algorithm works using the following process: first, given a set of initial policies for each player, an empirical payoff tensor is created and from that a meta-strategy is learnt for each player which is the mixture over that initial policy set which achieves the highest value. Then each player i in the game is iterated, and a new policy is trained against policies sampled from the meta-strategies of the other agents not equal to i.


Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

arXiv.org Artificial Intelligence

We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTGs). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be $\mathcal{O}(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRPG), where each team minimizes its cumulative cost independently in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which independent natural policy gradient is shown to exhibit linear convergence under time-independent diagonal dominance. Experiments illuminate the merits of this approach in practice.