AITopics | Mayne, Harry

Collaborating Authors

Mayne, Harry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation

Khouja, Jude, Korgul, Karolina, Hellsten, Simi, Yang, Lingyi, Neacsu, Vlad, Mayne, Harry, Kearns, Ryan, Bean, Andrew, Mahdi, Adam

arXiv.org Artificial IntelligenceMar-7-2025

Assessing the reasoning capabilities of large language models (LLMs) is susceptible to overestimation due to data exposure of evaluation benchmarks. We introduce a framework for producing linguistic reasoning problems that reduces the effect of memorisation in model performance estimates and apply this framework to develop LINGOLY-TOO, a challenging benchmark for linguistic reasoning. By developing orthographic templates, we dynamically obfuscate the writing systems of real languages to generate numerousquestion variations. These variations preserve the reasoning steps required for each solution while reducing the likelihood of specific problem instances appearing in model training data. Our experiments demonstrate that frontier models, including Claud 3.7 Sonnet, o1-preview and DeepSeek R1, struggle with advanced reasoning. Our analysis also shows that LLMs exhibit noticeable variance in accuracy across permutations of the same problem, and on average perform better on questions appearing in their original orthography. Our findings highlight the opaque nature of response generation in LLMs and provide evidence that prior data exposure contributes to over estimating the reasoning capabilities of frontier models.

large language model, machine learning, obfuscation, (18 more...)

arXiv.org Artificial Intelligence

2503.02972

Country:

Europe > United Kingdom (0.28)
North America > United States (0.28)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Toxic Neurons: A Mechanistic Analysis of DPO for Toxicity Reduction

Yang, Yushi, Sondej, Filip, Mayne, Harry, Mahdi, Adam

arXiv.org Artificial IntelligenceDec-13-2024

Safety fine-tuning algorithms are widely used to reduce harmful outputs in language models, but how they achieve this remain unclear. Studying the Direct Preference Optimization (DPO) algorithm for toxicity reduction, current explanations claim that DPO achieves this by dampening the activations of toxic MLP neurons. However, through activation patching, we show that this explanation is incomplete. Projections onto a toxicity probe's direction show that only 4.9% of toxicity reduction comes from dampened toxic neurons. Instead, DPO reduces toxicity through distributed activation shifts across a majority of neurons, progressively shifting MLP layer outputs away from toxicity. These shifts accumulate across four neuron groups: two reducing toxicity and two promoting anti-toxicity. Activation patching validates the cumulative roles of these groups, where patching all identified groups effectively replicates DPO's effects. These findings illustrate DPO's mechanism: it reduces toxicity by accumulating small activation shifts across many neurons throughout the layers. Our findings provide new mechanistic insights into how safety fine-tuning reduces harmful outputs in language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.06424

Country: Europe (0.15)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Can sparse autoencoders be used to decompose and interpret steering vectors?

Mayne, Harry, Yang, Yushi, Mahdi, Adam

arXiv.org Artificial IntelligenceNov-13-2024

Steering vectors are a promising approach to control the behaviour of large language models. However, their underlying mechanisms remain poorly understood. While sparse autoencoders (SAEs) may offer a potential method to interpret steering vectors, recent findings show that SAE-reconstructed vectors often lack the steering properties of the original vectors. This paper investigates why directly applying SAEs to steering vectors yields misleading decompositions, identifying two reasons: (1) steering vectors fall outside the input distribution for which SAEs are designed, and (2) steering vectors can have meaningful negative projections in feature directions, which SAEs are not designed to accommodate. These limitations hinder the direct use of SAEs for interpreting steering vectors.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.0879

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Bean, Andrew M., Hellsten, Simi, Mayne, Harry, Magomere, Jabez, Chi, Ethan A., Chi, Ryan, Hale, Scott A., Kirk, Hannah Rose

arXiv.org Artificial IntelligenceJun-11-2024

In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark covers more than 90 mostly low-resource languages, minimising issues of data contamination, and contains 1,133 problems across 6 formats and 5 levels of human difficulty. We assess performance with both direct accuracy and comparison to a no-context baseline to penalise memorisation. Scores from 11 state-of-the-art LLMs demonstrate the benchmark to be challenging, and models perform poorly on the higher difficulty problems. On harder problems, even the top model only achieved 38.7% accuracy, 24.7% improvement over the no-context baseline. Large closed models typically outperform open models, and in general, the higher resource the language, the better the scores. These results indicate, in absence of memorisation, true multi-step out-of-domain reasoning remains a challenge for current language models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.06196

Country:

Europe > United Kingdom (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Learning Approaches for Identifying ICU Patient Subgroups: Do Results Generalise?

Mayne, Harry, Parsons, Guy, Mahdi, Adam

arXiv.org Artificial IntelligenceMar-5-2024

The use of unsupervised learning to identify patient subgroups has emerged as a potentially promising direction to improve the efficiency of Intensive Care Units (ICUs). By identifying subgroups of patients with similar levels of medical resource need, ICUs could be restructured into a collection of smaller subunits, each catering to a specific group. However, it is unclear whether common patient subgroups exist across different ICUs, which would determine whether ICU restructuring could be operationalised in a standardised manner. In this paper, we tested the hypothesis that common ICU patient subgroups exist by examining whether the results from one existing study generalise to a different dataset. We extracted 16 features representing medical resource need and used consensus clustering to derive patient subgroups, replicating the previous study. We found limited similarities between our results and those of the previous study, providing evidence against the hypothesis. Our findings imply that there is significant variation between ICUs; thus, a standardised restructuring approach is unlikely to be appropriate. Instead, potential efficiency gains might be greater when the number and nature of the subunits are tailored to each ICU individually.

admission, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2403.02945

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback