AITopics | Cope, Dylan

Collaborating Authors

Cope, Dylan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi

arXiv.org Artificial IntelligenceOct-2-2024

The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of undesirable agent cooperation. The use of information hiding (steganography) in agent communications could render collusion practically undetectable. This underscores the need for evaluation frameworks to monitor and mitigate steganographic collusion capabilities. We address a crucial gap in the literature by demonstrating, for the first time, that robust steganographic collusion in LLMs can arise indirectly from optimization pressure. To investigate this problem we design two approaches -- a gradient-based reinforcement learning (GBRL) method and an in-context reinforcement learning (ICRL) method -- for reliably eliciting sophisticated LLM-generated linguistic text steganography. Importantly, we find that emergent steganographic collusion can be robust to both passive steganalytic oversight of model outputs and active mitigation through communication paraphrasing. We contribute a novel model evaluation framework and discuss limitations and future work. Our findings imply that effective risk mitigation from steganographic collusion post-deployment requires innovation in passive and active oversight techniques.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.03768

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training Neural Networks for Modularity aids Interpretability

Golechha, Satvik, Cope, Dylan, Schoots, Nandi

arXiv.org Artificial IntelligenceSep-24-2024

An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular using an ``enmeshment loss'' function that encourages the formation of non-interacting clusters. Using automated interpretability measures, we show that our method finds clusters that learn different, disjoint, and smaller circuits for CIFAR-10 labels. Our approach provides a promising direction for making neural networks easier to interpret.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

2409.15747

Country: North America > United States (0.14)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.40)
Health & Medicine > Therapeutic Area > Immunology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Mimicry and the Emergence of Cooperative Communication

Cope, Dylan, McBurney, Peter

arXiv.org Artificial IntelligenceMay-26-2024

In many situations, communication between agents is a critical component of cooperative multi-agent systems, however, it can be difficult to learn or evolve. In this paper, we investigate a simple way in which the emergence of communication may be facilitated. Namely, we explore the effects of when agents can mimic preexisting, externally generated useful signals. The key idea here is that these signals incentivise listeners to develop positive responses, that can then also be invoked by speakers mimicking those signals. This investigation starts with formalising this problem, and demonstrating that this form of mimicry changes optimisation dynamics and may provide the opportunity to escape non-communicative local optima. We then explore the problem empirically with a simulation in which spatially situated agents must communicate to collect resources. Our results show that both evolutionary optimisation and reinforcement learning may benefit from this intervention.

communication, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2405.16622

Country:

North America > United States (0.46)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

Learning Translations: Emergent Communication Pretraining for Cooperative Language Acquisition

Cope, Dylan, McBurney, Peter

arXiv.org Artificial IntelligenceFeb-25-2024

In Emergent Communication (EC) agents learn to communicate with one another, but the protocols that they develop are specialised to their training community. This observation led to research into Zero-Shot Coordination (ZSC) for learning communication strategies that are robust to agents not encountered during training. However, ZSC typically assumes that no prior data is available about the agents that will be encountered in the zero-shot setting. In many cases, this presents an unnecessarily hard problem and rules out communication via preestablished conventions. We propose a novel AI challenge called a Cooperative Language Acquisition Problem (CLAP) in which the ZSC assumptions are relaxed by allowing a 'joiner' agent to learn from a dataset of interactions between agents in a target community. We propose and compare two methods for solving CLAPs: Imitation Learning (IL), and Emergent Communication pretraining and Translation Learning (ECTL), in which an agent is trained in self-play with EC and then learns from the data to translate between the emergent protocol and the target community's protocol.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.16247

Country:

North America > United States (0.46)
Europe (0.28)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Improving Activation Steering in Language Models with Mean-Centring

Jorgensen, Ole, Cope, Dylan, Schoots, Nandi, Shanahan, Murray

arXiv.org Artificial IntelligenceDec-6-2023

Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are represented in these models. We seek to address this issue by applying the idea of mean-centring to steering vectors. We find that taking the average of activations associated with a target dataset, and then subtracting the mean of all training activations, results in effective steering vectors. We test this method on a variety of models on natural language tasks by steering away from generating toxic text, and steering the completion of a story towards a target genre. We also apply mean-centring to extract function vectors, more effectively triggering the execution of a range of natural language tasks by a significant margin (compared to previous baselines). This suggests that mean-centring can be used to easily improve the effectiveness of activation steering in a wide range of contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.03813

Country:

North America > United States (0.93)
Asia > Middle East > Qatar (0.14)

Genre:

Research Report (0.50)
Personal (0.46)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Low-Entropy Latent Variables Hurt Out-of-Distribution Performance

Schoots, Nandi, Cope, Dylan

arXiv.org Artificial IntelligenceMay-20-2023

We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift. We train models consisting of two feed-forward networks end-to-end separated by a discrete n-bit channel on an unsupervised contrastive learning task. Different masking strategies are applied after training that remove a proportion of low-entropy bits, high-entropy bits, or randomly selected bits, and the effects on performance are compared to the baseline accuracy with no mask. We hypothesize that the entropy of a bit serves as a guide to its usefulness out-of-distribution (OOD). Through experiment on three OOD datasets we demonstrate that the removal of low-entropy bits can notably benefit OOD performance. Conversely, we find that top-entropy masking disproportionately harms performance both in-distribution (InD) and OOD. The key challenge that we seek to address is that of identifying learned features in a model's intermediate representations that are more or less likely to be robust to distributional shift.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.12238

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Measure of Explanatory Effectiveness

Cope, Dylan, McBurney, Peter

arXiv.org Artificial IntelligenceMay-20-2023

The term explanation in artificial intelligence (AI) is often conflated with the concepts of interpretability and explainable AI (XAI), but there are important distinctions to be made. Miller (2019) defines interpretability and XAI as the process of building AI systems that humans can understand. In other words, by design, the AI's decision-making process is inherently transparent to a human. In contrast, explicitly explaining the decision-making to an arbitrary human is explanation generation. The latter is the subject of this paper. More specifically, we are working towards developing a formal framework for the automated generation and assessment of explanations. Firstly, some key terminology: an explanation is generated through a dialectical interaction whereby one agent, the explainer, seeks to'explain' some phenomenon, called the explanandum, to another agent, the explainee. In this article, we propose a novel measure of explanatory effectiveness that can be used to motivate artificial agents to generate good explanations (e.g. in the form of a reward signal), or to analyse the behaviours of existing communicating agents. We then define explanation games as cooperative games where two (or more) agents seek to maximise the effectiveness measure.

information, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2305.12233

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

Joining the Conversation: Towards Language Acquisition for Ad Hoc Team Play

Cope, Dylan, McBurney, Peter

arXiv.org Artificial IntelligenceMay-20-2023

In this paper, we propose and consider the problem of cooperative language acquisition as a particular form of the ad hoc team play problem. We then present a probabilistic model for inferring a speaker's intentions and a listener's semantics from observing communications between a team of language-users. This model builds on the assumptions that speakers are engaged in positive signalling and listeners are exhibiting positive listening, which is to say the messages convey hidden information from the listener, that then causes them to change their behaviour. Further, it accounts for potential sub-optimality in the speaker's ability to convey the right information (according to the given task). Finally, we discuss further work for testing and developing this framework.

artificial intelligence, machine learning, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2305.12235

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Learning to Communicate with Strangers via Channel Randomisation Methods

Cope, Dylan, Schoots, Nandi

arXiv.org Artificial IntelligenceApr-19-2021

We introduce two methods for improving the performance of agents meeting for the first time to accomplish a communicative task. The methods are: (1) `message mutation' during the generation of the communication protocol; and (2) random permutations of the communication channel. These proposals are tested using a simple two-player game involving a `teacher' who generates a communication protocol and sends a message, and a `student' who interprets the message. After training multiple agents via self-play we analyse the performance of these agents when they are matched with a stranger, i.e. their zero-shot communication performance. We find that both message mutation and channel permutation positively influence performance, and we discuss their effects.

deep learning, neural network, protocol, (16 more...)

arXiv.org Artificial Intelligence

2104.09557

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.82)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback