Stewart, Matthew
An Empirical Game-Theoretic Analysis of Autonomous Cyber-Defence Agents
Palmer, Gregory, Swaby, Luke, Harrold, Daniel J. B., Stewart, Matthew, Hiles, Alex, Willis, Chris, Miles, Ian, Farmer, Sara
The recent rise in increasingly sophisticated cyber-attacks raises the need for robust and resilient autonomous cyber-defence (ACD) agents. Given the variety of cyber-attack tactics, techniques and procedures (TTPs) employed, learning approaches that can return generalisable policies are desirable. Meanwhile, the assurance of ACD agents remains an open challenge. We address both challenges via an empirical game-theoretic analysis of deep reinforcement learning (DRL) approaches for ACD using the principled double oracle (DO) algorithm. This algorithm relies on adversaries iteratively learning (approximate) best responses against each others' policies; a computationally expensive endeavour for autonomous cyber operations agents. In this work we introduce and evaluate a theoretically-sound, potential-based reward shaping approach to expedite this process. In addition, given the increasing number of open-source ACD-DRL approaches, we extend the DO formulation to allow for multiple response oracles (MRO), providing a framework for a holistic evaluation of ACD approaches.
Machine Theory of Mind for Autonomous Cyber-Defence
Swaby, Luke, Stewart, Matthew, Harrold, Daniel, Willis, Chris, Palmer, Gregory
Intelligent autonomous agents hold much potential for the domain of cyber security. However, due to many state-of-the-art approaches relying on uninterpretable black-box models, there is growing demand for methods that offer stakeholders clear and actionable insights into their latent beliefs and motivations. To address this, we evaluate Theory of Mind (ToM) approaches for Autonomous Cyber Operations. Upon learning a robust prior, ToM models can predict an agent's goals, behaviours, and contextual beliefs given only a handful of past behaviour observations. In this paper, we introduce a novel Graph Neural Network (GNN)-based ToM architecture tailored for cyber-defence, Graph-In, Graph-Out (GIGO)-ToM, which can accurately predict both the targets and attack trajectories of adversarial cyber agents over arbitrary computer network topologies. To evaluate the latter, we propose a novel extension of the Wasserstein distance for measuring the similarity of graph-based probability distributions. Whereas the standard Wasserstein distance lacks a fixed reference scale, we introduce a graph-theoretic normalization factor that enables a standardized comparison between networks of different sizes. We furnish this metric, which we term the Network Transport Distance (NTD), with a weighting function that emphasizes predictions according to custom node features, allowing network operators to explore arbitrary strategic considerations. Benchmarked against a Graph-In, Dense-Out (GIDO)-ToM architecture in an abstract cyber-defence environment, our empirical evaluations show that GIGO-ToM can accurately predict the goals and behaviours of various unseen cyber-attacking agents across a range of network topologies, as well as learn embeddings that can effectively characterize their policies.
Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection
Banbury, Colby, Njor, Emil, Stewart, Matthew, Warden, Pete, Kudlur, Manjunath, Jeffries, Nat, Fafoutis, Xenofon, Reddi, Vijay Janapa
Tiny machine learning (TinyML), which enables machine learning applications on extremely low-power devices, suffers from limited size and quality of relevant datasets. To address this issue, we introduce Wake Vision, a large-scale, diverse dataset tailored for person detection, the canonical task for TinyML visual sensing. Wake Vision comprises over 6 million images, representing a hundredfold increase compared to the previous standard, and has undergone thorough quality filtering. We provide two Wake Vision training sets: Wake Vision (Large) and Wake Vision (Quality), a smaller set with higher-quality labels. Our results demonstrate that using the Wake Vision (Quality) training set produces more accurate models than the Wake Vision (Large) training set, strongly suggesting that label quality is more important than quantity in our setting. We find use for the large training set for pre-training and knowledge distillation. To minimize label errors that can obscure true model performance, we manually label the validation and test sets, improving the test set error rate from 7.8% in the prior standard to only 2.2%. In addition to the dataset, we provide a collection of five detailed benchmark sets to facilitate the evaluation of model quality in challenging real world scenarios that are often ignored when focusing solely on overall accuracy. These novel fine-grained benchmarks assess model performance on specific segments of the test data, such as varying lighting conditions, distances from the camera, and demographic characteristics of subjects. Our results demonstrate that using Wake Vision for training results in a 2.49% increase in accuracy compared to the established dataset. We also show the importance of dataset quality for low-capacity models and the value of dataset size for high-capacity models. wakevision.ai
RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance
Mayoral-Vilches, Víctor, Jabbour, Jason, Hsiao, Yu-Shun, Wan, Zishen, Crespo-Álvarez, Martiño, Stewart, Matthew, Reina-Muñoz, Juan Manuel, Nagras, Prateek, Vikhe, Gaurav, Bakhshalipour, Mohammad, Pinzger, Martin, Rass, Stefan, Panigrahi, Smruti, Corradi, Giulio, Roy, Niladri, Gibbons, Phillip B., Neuman, Sabrina M., Plancher, Brian, Reddi, Vijay Janapa
We introduce RobotPerf, a vendor-agnostic benchmarking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics.
Is TinyML Sustainable? Assessing the Environmental Impacts of Machine Learning on Microcontrollers
Prakash, Shvetank, Stewart, Matthew, Banbury, Colby, Mazumder, Mark, Warden, Pete, Plancher, Brian, Reddi, Vijay Janapa
The sustained growth of carbon emissions and global waste elicits significant sustainability concerns for our environment's future. The growing Internet of Things (IoT) has the potential to exacerbate this issue. However, an emerging area known as Tiny Machine Learning (TinyML) has the opportunity to help address these environmental challenges through sustainable computing practices. TinyML, the deployment of machine learning (ML) algorithms onto low-cost, low-power microcontroller systems, enables on-device sensor analytics that unlocks numerous always-on ML applications. This article discusses both the potential of these TinyML applications to address critical sustainability challenges, as well as the environmental footprint of this emerging technology. Through a complete life cycle analysis (LCA), we find that TinyML systems present opportunities to offset their carbon emissions by enabling applications that reduce the emissions of other sectors. Nevertheless, when globally scaled, the carbon footprint of TinyML systems is not negligible, necessitating that designers factor in environmental impact when formulating new devices. Finally, we outline research directions to enable further sustainable contributions of TinyML.
Datasheets for Machine Learning Sensors
Stewart, Matthew, Warden, Pete, Omri, Yasmine, Prakash, Shvetank, Santos, Joao, Hymel, Shawn, Brown, Benjamin, MacArthur, Jim, Jeffries, Nat, Plancher, Brian, Reddi, Vijay Janapa
Machine learning (ML) sensors offer a new paradigm for sensing that enables intelligence at the edge while empowering end-users with greater control of their data. As these ML sensors play a crucial role in the development of intelligent devices, clear documentation of their specifications, functionalities, and limitations is pivotal. This paper introduces a standard datasheet template for ML sensors and discusses its essential components including: the system's hardware, ML model and dataset attributes, end-to-end performance metrics, and environmental impact. We provide an example datasheet for our own ML sensor and discuss each section in detail. We highlight how these datasheets can facilitate better understanding and utilization of sensor data in ML applications, and we provide objective measures upon which system performance can be evaluated and compared. Together, ML sensors and their datasheets provide greater privacy, security, transparency, explainability, auditability, and user-friendliness for ML-enabled embedded systems. We conclude by emphasizing the need for standardization of datasheets across the broader ML community to ensure the responsible and effective use of sensor data.