AITopics | Chanin, David

Collaborating Authors

Chanin, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Chanin, David, Wilken-Smith, James, Dulka, Tomáš, Bhatnagar, Hardik, Bloom, Joseph

arXiv.org Artificial IntelligenceSep-30-2024

Sparse Autoencoders (SAEs) have emerged as a promising approach to decompose the activations of Large Language Models (LLMs) into human-interpretable latents. In this paper, we pose two questions. First, to what extent do SAEs extract monosemantic and interpretable latents? Second, to what extent does varying the sparsity or the size of the SAE affect monosemanticity / interpretability? By investigating these questions in the context of a simple first-letter identification task where we have complete access to ground truth labels for all tokens in the vocabulary, we are able to provide more detail than prior investigations. Critically, we identify a problematic form of feature-splitting we call feature absorption where seemingly monosemantic latents fail to fire in cases where they clearly should. Our investigation suggests that varying SAE size or sparsity is insufficient to solve this issue, and that there are deeper conceptual issues in need of resolution.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.14507

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

Tan, Daniel, Chanin, David, Lynch, Aengus, Kanoulas, Dimitrios, Paige, Brooks, Garriga-Alonso, Adria, Kirk, Robert

arXiv.org Artificial IntelligenceJul-17-2024

Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. In-distribution, steerability is highly variable across different inputs. Depending on the concept, spurious biases can substantially contribute to how effective steering is for each input, presenting a challenge for the widespread use of steering vectors. Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt, resulting in them failing to generalise well. Overall, our findings show that while steering can work well in the right circumstances, there remain many technical difficulties of applying steering vectors to guide models' behaviour at scale.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.12404

Country:

Asia (0.28)
Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Identifying Linear Relational Concepts in Large Language Models

Chanin, David, Hunter, Anthony, Camburu, Oana-Maria

arXiv.org Artificial IntelligenceNov-15-2023

Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. However, for any given human-interpretable concept, how can we find its direction in the latent space? We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts at a given hidden layer in a transformer LM by first modeling the relation between subject and object as a linear relational embedding (LRE). While the LRE work was mainly presented as an exercise in understanding model representations, we find that inverting the LRE while using earlier object layers results in a powerful technique to find concept directions that both work well as a classifier and causally influence model outputs.

large language model, machine learning, relation, (19 more...)

arXiv.org Artificial Intelligence

2311.08968

Country:

Asia (1.00)
North America > United States (0.46)
Europe > United Kingdom > England (0.14)
South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Open-source Frame Semantic Parsing

Chanin, David

arXiv.org Artificial IntelligenceMar-22-2023

Frame semantic parsing (Gildea and Jurafsky, 2002) is a natural language understanding (NLU) task involving finding structured semantic frames and their arguments from natural language text as formalized by the FrameNet project (Baker et al., 1998). Frame semantics has proved useful in understanding user intent from text, finding use in modern voice assistants (Chen et al., 2019), dialog systems (Chen et al., 2013), and even text analysis (Zhao et al., 2023). A semantic frame in FrameNet describes an event, relation, or situation and its participants. When a frame occurs in a sentence, there is typically a "trigger" word in the sentence which is said to evoke the frame. In addition, a frame contains a list of arguments known as frame elements which describe the semantic roles that pertain to the frame. A sample sentence parsed for frame and frame elements is shown in Figure 1. FrameNet provides a list of lexical units (LUs) for each frame, which are word senses with may evoke the frame when they occur in a sentence. For instance, the frame "Attack" has lexical units "ambush.n",

artificial intelligence, frame element, natural language, (12 more...)

arXiv.org Artificial Intelligence

2303.12788

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Neuro-symbolic Commonsense Social Reasoning

Chanin, David, Hunter, Anthony

arXiv.org Artificial IntelligenceMar-14-2023

Social norms underlie all human social interactions, yet formalizing and reasoning with them remains a major challenge for AI systems. We present a novel system for taking social rules of thumb (ROTs) in natural language from the Social Chemistry 101 dataset and converting them to first-order logic where reasoning is performed using a neuro-symbolic theorem prover. We accomplish this in several steps. First, ROTs are converted into Abstract Meaning Representation (AMR), which is a graphical representation of the concepts in a sentence, and align the AMR with RoBERTa embeddings. We then generate alternate simplified versions of the AMR via a novel algorithm, recombining and merging embeddings for added robustness against different wordings of text, and incorrect AMR parses. The AMR is then converted into first-order logic, and is queried with a neuro-symbolic theorem prover. The goal of this paper is to develop and evaluate a neuro-symbolic method which performs explicit reasoning about social situations in a logical form.

amr, artificial intelligence, logic & formal reasoning, (19 more...)

arXiv.org Artificial Intelligence

2303.08264

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback