Arnold, Thomas
RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration
Russell-Gilbert, Alicia, Mittal, Sudip, Rahimi, Shahram, Seale, Maria, Jabour, Joseph, Arnold, Thomas, Church, Joshua
Anomaly detection in complex industrial environments poses unique challenges, particularly in contexts characterized by data sparsity and evolving operational conditions. Predictive maintenance (PdM) in such settings demands methodologies that are adaptive, transferable, and capable of integrating domain-specific knowledge. In this paper, we present RAAD-LLM, a novel framework for adaptive anomaly detection, leveraging large language models (LLMs) integrated with Retrieval-Augmented Generation (RAG). This approach addresses the aforementioned PdM challenges. By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data without requiring fine-tuning on specific datasets. The framework's adaptability mechanism enables it to adjust its understanding of normal operating conditions dynamically, thus increasing detection accuracy. We validate this methodology through a real-world application for a plastics manufacturing plant and the Skoltech Anomaly Benchmark (SKAB). Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset. By allowing for the enriching of input series data with semantics, RAAD-LLM incorporates multimodal capabilities that facilitate more collaborative decision-making between the model and plant operators. Overall, our findings support RAAD-LLM's ability to revolutionize anomaly detection methodologies in PdM, potentially leading to a paradigm shift in how anomaly detection is implemented across various industries.
Multivariate Data Augmentation for Predictive Maintenance using Diffusion
Thompson, Andrew, Sommers, Alexander, Russell-Gilbert, Alicia, Cummins, Logan, Mittal, Sudip, Rahimi, Shahram, Seale, Maria, Jaboure, Joseph, Arnold, Thomas, Church, Joshua
Predictive maintenance has been used to optimize system repairs in the industrial, medical, and financial domains. This technique relies on the consistent ability to detect and predict anomalies in critical systems. AI models have been trained to detect system faults, improving predictive maintenance efficiency. Typically there is a lack of fault data to train these models, due to organizations working to keep fault occurrences and down time to a minimum. For newly installed systems, no fault data exists since they have yet to fail. By using diffusion models for synthetic data generation, the complex training datasets for these predictive models can be supplemented with high level synthetic fault data to improve their performance in anomaly detection. By learning the relationship between healthy and faulty data in similar systems, a diffusion model can attempt to apply that relationship to healthy data of a newly installed system that has no fault data. The diffusion model would then be able to generate useful fault data for the new system, and enable predictive models to be trained for predictive maintenance. The following paper demonstrates a system for generating useful, multivariate synthetic data for predictive maintenance, and how it can be applied to systems that have yet to fail.
AAD-LLM: Adaptive Anomaly Detection Using Large Language Models
Russell-Gilbert, Alicia, Sommers, Alexander, Thompson, Andrew, Cummins, Logan, Mittal, Sudip, Rahimi, Shahram, Seale, Maria, Jaboure, Joseph, Arnold, Thomas, Church, Joshua
For data-constrained, complex and dynamic industrial environments, there is a critical need for transferable and multimodal methodologies to enhance anomaly detection and therefore, prevent costs associated with system failures. Typically, traditional PdM approaches are not transferable or multimodal. This work examines the use of Large Language Models (LLMs) for anomaly detection in complex and dynamic manufacturing systems. The research aims to improve the transferability of anomaly detection models by leveraging Large Language Models (LLMs) and seeks to validate the enhanced effectiveness of the proposed approach in data-sparse industrial applications. The research also seeks to enable more collaborative decision-making between the model and plant operators by allowing for the enriching of input series data with semantics. Additionally, the research aims to address the issue of concept drift in dynamic industrial settings by integrating an adaptability mechanism. The literature review examines the latest developments in LLM time series tasks alongside associated adaptive anomaly detection methods to establish a robust theoretical framework for the proposed architecture. This paper presents a novel model framework (AAD-LLM) that doesn't require any training or finetuning on the dataset it is applied to and is multimodal. Results suggest that anomaly detection can be converted into a "language" task to deliver effective, context-aware detection in data-constrained industrial applications. This work, therefore, contributes significantly to advancements in anomaly detection methodologies.
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
Wang, Yuxia, Mansurov, Jonibek, Ivanov, Petar, Su, Jinyan, Shelmanov, Artem, Tsvigun, Akim, Afzal, Osama Mohanned, Mahmoud, Tarek, Puccetti, Giovanni, Arnold, Thomas, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav
The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific fields, and maintaining trust in communication. In this work, we address this problem by introducing a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench. The benchmark is compiled of three tasks: (1) mono-lingual and multi-lingual binary MGT detection; (2) multi-way detection where one need to identify, which particular model generated the text; and (3) mixed human-machine text detection, where a word boundary delimiting MGT from human-written content should be determined. On the developed benchmark, we have tested several MGT detection baselines and also conducted an evaluation of human performance. We see that obtaining good performance in MGT detection usually requires an access to the training data from the same domain and generators. The benchmark is available at https://github.com/mbzuai-nlp/M4GT-Bench.
A Survey of Transformer Enabled Time Series Synthesis
Sommers, Alexander, Cummins, Logan, Mittal, Sudip, Rahimi, Shahram, Seale, Maria, Jaboure, Joseph, Arnold, Thomas
Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the transformer, generative AI, and time series data, and reviews works in this sparsely populated subdomain. The reviewed works show great variety in approach, and have not yet converged on a conclusive answer to the problems the domain poses. GANs, diffusion models, state space models, and autoencoders were all encountered alongside or surrounding the transformers which originally motivated the survey. While too open a domain to offer conclusive insights, the works surveyed are quite suggestive, and several recommendations for best practice, and suggestions of valuable future work, are provided.
SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection
Wang, Yuxia, Mansurov, Jonibek, Ivanov, Petar, Su, Jinyan, Shelmanov, Artem, Tsvigun, Akim, Afzal, Osama Mohammed, Mahmoud, Tarek, Puccetti, Giovanni, Arnold, Thomas, Whitehouse, Chenxi, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav
We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.
Quasi-Dilemmas for Artificial Moral Agents
Kasenberg, Daniel, Sarathy, Vasanth, Arnold, Thomas, Scheutz, Matthias, Williams, Tom
In this paper we describe moral quasi-dilemmas (MQDs): situations similar to moral dilemmas, but in which an agent is unsure whether exploring the plan space or the world may reveal a course of action that satisfies all moral requirements. We argue that artificial moral agents (AMAs) should be built to handle MQDs (in particular, by exploring the plan space rather than immediately accepting the inevitability of the moral dilemma), and that MQDs may be useful for evaluating AMA architectures.
Value Alignment or Misalignment -- What Will Keep Systems Accountable?
Arnold, Thomas (Tufts University) | Kasenberg, Daniel (Tufts University) | Scheutz, Matthias (Tufts University)
Machine learning's advances have led to new ideas about the feasibility and importance of machine ethics keeping pace, with increasing emphasis on safety, containment, and alignment. This paper addresses a recent suggestion that inverse reinforcement learning (IRL) could be a means to so-called "value alignment.'' We critically consider how such an approach can engage the social, norm-infused nature of ethical action and outline several features of ethical appraisal that go beyond simple models of behavior, including unavoidably temporal dimensions of norms and counterfactuals. We propose that a hybrid approach for computational architectures still offers the most promising avenue for machines acting in an ethical fashion.
Relational Enhancement: A Framework for Evaluating and Designing Human-Robot Relationships
Wilson, Jason R. (Tufts University) | Arnold, Thomas (Tufts University) | Scheutz, Matthias (Tufts Univsersity)
Much existing work examining the ethical behaviors of robots does not consider the impact and effects of long- term human-robot interactions. A robot teammate, col- laborator or helper is often expected to increase task performance, individually or of the team, but little dis- cussion is usually devoted to how such a robot should balance the task requirements with building and main- taining a “working relationship” with a human partner, much less appropriate social relations outside that team. We propose the “Relational Enhancement” framework for the design and evaluation of long-term interactions, which composed of interrelated concepts of efficiency, solidarity, and prosocial concern. We discuss how this framework can be used to evaluate common existing ap- proaches in cognitive architectures for robots and then examine how social norms and mental simulation may contribute to each of the components of the framework.