AITopics | Schoots, Nandi

Collaborating Authors

Schoots, Nandi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

Schoots, Nandi, Villani, Mattia Jacopo, de Bos, Niels uit

arXiv.org Artificial IntelligenceMar-3-2025

Kolmogorov-Arnold Networks are a new family of neural network architectures which holds promise for overcoming the curse of dimensionality and has interpretability benefits (Liu et al., 2024). In this paper, we explore the connection between Kolmogorov Arnold Networks (KANs) with piecewise linear (uni-variate real) functions and ReLU networks. We provide completely explicit constructions to convert a piecewise linear KAN into a ReLU network and vice versa.

activation function, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2503.01702

Country: North America > United States > Colorado (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modular Training of Neural Networks aids Interpretability

Golechha, Satvik, Chaudhary, Maheep, Velja, Joan, Abate, Alessandro, Schoots, Nandi

arXiv.org Artificial IntelligenceFeb-6-2025

An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. Using automated interpretability techniques, we show that our method can help train models that are more modular and learn different, disjoint, and smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and language models. Our approach provides a promising direction for training neural networks that learn simpler functions and are easier to interpret.

artificial intelligence, machine learning, modular training, (14 more...)

arXiv.org Artificial Intelligence

2502.0247

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.40)
Health & Medicine > Therapeutic Area > Immunology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Open Problems in Mechanistic Interpretability

Sharkey, Lee, Chughtai, Bilal, Batson, Joshua, Lindsey, Jack, Wu, Jeff, Bushnaq, Lucius, Goldowsky-Dill, Nicholas, Heimersheim, Stefan, Ortega, Alejandro, Bloom, Joseph, Biderman, Stella, Garriga-Alonso, Adria, Conmy, Arthur, Nanda, Neel, Rumbelow, Jessica, Wattenberg, Martin, Schoots, Nandi, Miller, Joseph, Michaud, Eric J., Casper, Stephen, Tegmark, Max, Saunders, William, Bau, David, Todd, Eric, Geiger, Atticus, Geva, Mor, Hoogland, Jesse, Murfet, Daniel, McGrath, Tom

arXiv.org Artificial IntelligenceJan-27-2025

Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing. This review collects the perspectives of its various authors and represents a synthesis of their views by Apollo Research on behalf of Schmidt Sciences. The perspectives presented here do not necessarily reflect the views of any individual author or the institutions with which they are affiliated.

data mining, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2501.16496

Country:

Europe > United Kingdom (0.92)
North America > United States > Texas > Kleberg County (0.24)
North America > United States > Texas > Chambers County (0.24)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.67)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

The Propensity for Density in Feed-forward Models

Schoots, Nandi, Jackson, Alex, Kholmovaia, Ali, McBurney, Peter, Shanahan, Murray

arXiv.org Artificial IntelligenceOct-18-2024

Does the process of training a neural network to solve a task tend to use all of the available weights even when the task could be solved with fewer weights? To address this question we study the effects of pruning fully connected, convolutional and residual models while varying their widths. We find that the proportion of weights that can be pruned without degrading performance is largely invariant to model size. Increasing the width of a model has little effect on the density of the pruned model relative to the increase in absolute size of the pruned network. In particular, we find substantial prunability across a large range of model sizes, where our biggest model is 50 times as wide as our smallest model. We explore three hypotheses that could explain these findings.

artificial intelligence, effective density, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA240819

2410.14461

Country:

Europe (0.68)
North America > United States (0.29)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi

arXiv.org Artificial IntelligenceOct-2-2024

The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of undesirable agent cooperation. The use of information hiding (steganography) in agent communications could render collusion practically undetectable. This underscores the need for evaluation frameworks to monitor and mitigate steganographic collusion capabilities. We address a crucial gap in the literature by demonstrating, for the first time, that robust steganographic collusion in LLMs can arise indirectly from optimization pressure. To investigate this problem we design two approaches -- a gradient-based reinforcement learning (GBRL) method and an in-context reinforcement learning (ICRL) method -- for reliably eliciting sophisticated LLM-generated linguistic text steganography. Importantly, we find that emergent steganographic collusion can be robust to both passive steganalytic oversight of model outputs and active mitigation through communication paraphrasing. We contribute a novel model evaluation framework and discuss limitations and future work. Our findings imply that effective risk mitigation from steganographic collusion post-deployment requires innovation in passive and active oversight techniques.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.03768

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training Neural Networks for Modularity aids Interpretability

Golechha, Satvik, Cope, Dylan, Schoots, Nandi

arXiv.org Artificial IntelligenceSep-24-2024

An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular using an ``enmeshment loss'' function that encourages the formation of non-interacting clusters. Using automated interpretability measures, we show that our method finds clusters that learn different, disjoint, and smaller circuits for CIFAR-10 labels. Our approach provides a promising direction for making neural networks easier to interpret.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Artificial Intelligence

2409.15747

Country: North America > United States (0.14)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.40)
Health & Medicine > Therapeutic Area > Immunology (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Extending Activation Steering to Broad Skills and Multiple Behaviours

van der Weij, Teun, Poesio, Massimo, Schoots, Nandi

arXiv.org Artificial IntelligenceMar-8-2024

Current large language models have dangerous capabilities, which are likely to become more problematic in the future. Activation steering techniques can be used to reduce risks from these capabilities. In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours. First, by comparing the effects of reducing performance on general coding ability and Python-specific ability, we find that steering broader skills is competitive to steering narrower skills. Second, we steer models to become more or less myopic and wealth-seeking, among other behaviours. In our experiments, combining steering vectors for multiple different behaviours into one steering vector is largely unsuccessful. On the other hand, injecting individual steering vectors at different places in a model simultaneously is promising.

large language model, machine learning, natural language, (11 more...)

arXiv.org Artificial Intelligence

2403.05767

Genre: Research Report > New Finding (1.00)

Industry: Government (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

Dissecting Language Models: Machine Unlearning via Selective Pruning

Pochinkov, Nicholas, Schoots, Nandi

arXiv.org Artificial IntelligenceMar-2-2024

Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2403.01267

Country:

Europe (0.67)
North America > United States > Hawaii (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Improving Activation Steering in Language Models with Mean-Centring

Jorgensen, Ole, Cope, Dylan, Schoots, Nandi, Shanahan, Murray

arXiv.org Artificial IntelligenceDec-6-2023

Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are represented in these models. We seek to address this issue by applying the idea of mean-centring to steering vectors. We find that taking the average of activations associated with a target dataset, and then subtracting the mean of all training activations, results in effective steering vectors. We test this method on a variety of models on natural language tasks by steering away from generating toxic text, and steering the completion of a story towards a target genre. We also apply mean-centring to extract function vectors, more effectively triggering the execution of a range of natural language tasks by a significant margin (compared to previous baselines). This suggests that mean-centring can be used to easily improve the effectiveness of activation steering in a wide range of contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.03813

Country:

North America > United States (0.93)
Asia > Middle East > Qatar (0.14)

Genre:

Research Report (0.50)
Personal (0.46)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Comparing Optimization Targets for Contrast-Consistent Search

Fry, Hugo, Fallows, Seamus, Fan, Ian, Wright, Jamie, Schoots, Nandi

arXiv.org Artificial IntelligenceNov-1-2023

We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function attains a higher test accuracy than CCS.

artificial intelligence, contrast-consistent search, natural language, (1 more...)

arXiv.org Artificial Intelligence

2311.00488

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback