AITopics | Metcalf, Katherine

Collaborating Authors

Metcalf, Katherine

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Sundar, Anirudh, Williamson, Sinead, Metcalf, Katherine, Theobald, Barry-John, Seto, Skyler, Fedzechkina, Masha

arXiv.org Artificial IntelligenceFeb-21-2025

Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is computationally expensive, and sizable language data, which often may not be available. A data-efficient alternative to fine-tuning is model interventions -- a method for manipulating model activations to steer generation into the desired direction. We analyze the effect of a popular intervention (finding experts) on the alignment of cross-lingual representations in mLLMs. We identify the neurons to manipulate for a given language and introspect the embedding space of mLLMs pre- and post-manipulation. We show that modifying the mLLM's activations changes its embedding space such that cross-lingual alignment is enhanced. Further, we show that the changes to the embedding space translate into improved downstream performance on retrieval tasks, with up to 2x improvements in top-1 accuracy on cross-lingual retrieval.

intervention, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.15639

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Italy (0.14)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

Fedzechkina, Masha, Gualdoni, Eleonora, Williamson, Sinead, Metcalf, Katherine, Seto, Skyler, Theobald, Barry-John

arXiv.org Artificial IntelligenceFeb-20-2025

Modern large language models (LLMs) achieve impressive performance on some tasks, while exhibiting distinctly non-human-like behaviors on others. This raises the question of how well the LLM's learned representations align with human representations. In this work, we introduce a novel approach to the study of representation alignment: we adopt a method from research on activation steering to identify neurons responsible for specific concepts (e.g., 'cat') and then analyze the corresponding activation patterns. Our findings reveal that LLM representations closely align with human representations inferred from behavioral data. Notably, this alignment surpasses that of word embeddings, which have been center stage in prior work on human and model alignment. Additionally, our approach enables a more granular view of how LLMs represent concepts. Specifically, we show that LLMs organize concepts in a way that reflects hierarchical relationships interpretable to humans (e.g., 'animal'-'dog').

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.1509

Country:

Asia (0.46)
Europe (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Way to LLM Personalization: Learning to Remember User Conversations

Magister, Lucie Charlotte, Metcalf, Katherine, Zhang, Yizhe, ter Hoeve, Maartje

arXiv.org Artificial IntelligenceNov-20-2024

Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.13405

Country:

Asia (0.68)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

PREDICT: Preference Reasoning by Evaluating Decomposed preferences Inferred from Candidate Trajectories

Aroca-Ouellette, Stephane, Mackraz, Natalie, Theobald, Barry-John, Metcalf, Katherine

arXiv.org Artificial IntelligenceOct-8-2024

Accommodating human preferences is essential for creating AI agents that deliver personalized and effective interactions. Recent work has shown the potential for LLMs to infer preferences from user interactions, but they often produce broad and generic preferences, failing to capture the unique and individualized nature of human preferences. This paper introduces PREDICT, a method designed to enhance the precision and adaptability of inferring preferences. PREDICT incorporates three key elements: (1) iterative refinement of inferred preferences, (2) decomposition of preferences into constituent components, and (3) validation of preferences across multiple trajectories. We evaluate PREDICT on two distinct environments: a gridworld setting and a new text-domain environment (PLUME).

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.06273

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

Suau, Xavier, Delobelle, Pieter, Metcalf, Katherine, Joulin, Armand, Apostoloff, Nicholas, Zappella, Luca, Rodríguez, Pau

arXiv.org Artificial IntelligenceJul-2-2024

An important issue with Large Language Models (LLMs) is their undesired ability to generate toxic language. In this work, we show that the neurons responsible for toxicity can be determined by their power to discriminate toxic sentences, and that toxic language can be mitigated by reducing their activation levels proportionally to this power. We propose AUROC adaptation (AurA), an intervention that can be applied to any pre-trained LLM to mitigate toxicity. As the intervention is proportional to the ability of each neuron to discriminate toxic content, it is free of any model-dependent hyperparameters. We show that AurA can achieve up to $2.2 \times$ reduction in toxicity with only a $0.72$ perplexity increase. We also show that AurA is effective with models of different scale (from 1.5B to 40B parameters), and its effectiveness in mitigating toxic language, while preserving common-sense zero-shot abilities, holds across all scales. AurA can be combined with pre-prompting strategies, boosting its average mitigation potential from $1.28\times$ to $2.35\times$. Moreover, AurA can counteract adversarial pre-prompts that maliciously elicit toxic content, making it an effective method for deploying safer and less toxic models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.12824

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hindsight PRIORs for Reward Learning from Human Preferences

Verma, Mudit, Metcalf, Katherine

arXiv.org Artificial IntelligenceApr-12-2024

Preference based Reinforcement Learning (PbRL) removes the need to hand specify a reward function by learning a reward from preference feedback over policy behaviors. Current approaches to PbRL do not address the credit assignment problem inherent in determining which parts of a behavior most contributed to a preference, which result in data intensive approaches and subpar reward functions. We address such limitations by introducing a credit assignment strategy (Hindsight PRIOR) that uses a world model to approximate state importance within a trajectory and then guides rewards to be proportional to state importance through an auxiliary predicted return redistribution objective. Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks. For example, Hindsight PRIOR recovers on average significantly (p < 0.05) more reward on MetaWorld (20%) and DMC (15%). The performance gains and our ablations demonstrate the benefits even a simple credit assignment strategy can have on reward learning and that state importance in forward dynamics prediction is a strong proxy for a state's contribution to a preference decision. Code repository can be found at https://github.com/apple/ Preference-based reinforcement learning (PbRL) learns a policy from preference feedback removing the need to hand specify a reward function. Compared to other methods that avoid hand-specifying a reward function (e.g. Additionally, PbRL can be deployed as human-in-the-loop allowing guidance to adapt on-the-fly to sub-optimal policies, and has shown to be highly effective for complex tasks where reward specification is not feasible (e.g.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2404.08828

Country: North America > United States > Arizona (0.14)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Metcalf, Katherine, Sarabia, Miguel, Mackraz, Natalie, Theobald, Barry-John

arXiv.org Artificial IntelligenceFeb-27-2024

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation (z^{sa}) via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from (z^{sa}), which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover 83\% and 66\% of ground truth reward policy performance versus only 38\% and 21\%. The performance gains demonstrate the benefits of explicitly learning a dynamics-aware reward model. Repo: \texttt{https://github.com/apple/ml-reed}.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2402.17975

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Large Language Models as Generalizable Policies for Embodied Tasks

Szot, Andrew, Schwarzer, Max, Agrawal, Harsh, Mazoure, Bogdan, Talbott, Walter, Metcalf, Katherine, Mackraz, Natalie, Hjelm, Devon, Toshev, Alexander

arXiv.org Artificial IntelligenceOct-26-2023

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs. Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement. Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.17722

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mirroring to Build Trust in Digital Assistants

Metcalf, Katherine, Theobald, Barry-John, Weinberg, Garrett, Lee, Robert, Jonsson, Ing-Marie, Webb, Russ, Apostoloff, Nicholas

arXiv.org Artificial IntelligenceApr-2-2019

We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way that either matched their conversational style, or did not. Using self-reported personality attributes and subjects' feedback on the interactions, we built models that can reliably predict a user's preferred conversational style.

artificial intelligence, interaction, participant, (15 more...)

arXiv.org Artificial Intelligence

1904.01664

Country: North America > United States (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Learning Sharing Behaviors with Arbitrary Numbers of Agents

Metcalf, Katherine, Theobald, Barry-John, Apostoloff, Nicholas

arXiv.org Machine LearningDec-10-2018

We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs) with weights dynamically updated during interactions, and the multi-agent fusion model is a logistic regression classifier. We test our models in a multi-agent tower-building environment, where a Q-learning agent learns to interact with rule-based agents. Our approach accurately models the underlying behavior patterns of the rule-based agents with accuracy ranging between 0.63 and 1.0 depending on the stochasticity of the other agent behaviors. In addition we show using KL-divergence that the model accurately captures the distribution of next actions when interacting with both a single agent (KL-divergence < 0.1) and with multiple agents (KL-divergence < 0.37). Finally, we demonstrate that our behavior model can be used by a Q-learning agent to take turns in an interactive turn-taking environment.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1812.04145

Country: North America > United States > Indiana (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback