AITopics | Schulz, Eric

Collaborating Authors

Schulz, Eric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Testing the limits of fine-tuning to improve reasoning in vision language models

Buschoff, Luca M. Schulze, Voudouris, Konstantinos, Akata, Elif, Bethge, Matthias, Tenenbaum, Joshua B., Schulz, Eric

arXiv.org Artificial IntelligenceFeb-21-2025

Pre-trained vision language models still fall short of human visual cognition. In an effort to improve visual cognition and align models with human behavior, we introduce visual stimuli and human judgments on visual cognition tasks, allowing us to systematically evaluate performance across cognitive domains under a consistent environment. We fine-tune models on ground truth data for intuitive physics and causal reasoning and find that this improves model performance in the respective fine-tuning domain. Furthermore, it can improve model alignment with human behavior. However, we find that fine-tuning does not contribute to robust human-like generalization to data with other visual characteristics or to tasks in other cognitive domains.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.15678

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Discovering Chunks in Neural Embeddings for Interpretability

Wu, Shuchen, Alaniz, Stephan, Schulz, Eric, Akata, Zeynep

arXiv.org Artificial IntelligenceFeb-3-2025

Understanding neural networks is challenging due to their high-dimensional, interacting components. Inspired by human cognition, which processes complex sensory data by chunking it into recurring entities, we propose leveraging this principle to interpret artificial neural population activities. Biological and artificial intelligence share the challenge of learning from structured, naturalistic data, and we hypothesize that the cognitive mechanism of chunking can provide insights into artificial systems. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities, observing that their hidden states reflect these patterns, which can be extracted as a dictionary of chunks that influence network responses. Extending this to large language models (LLMs) like LLaMA, we identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts. By exploring methods to extract dictionaries of identifiable chunks across neural embeddings of varying complexity, our findings introduce a new framework for interpreting neural networks, framing their population activity as structured reflections of the data they process.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.01803

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Towards Automation of Cognitive Modeling using Large Language Models

Rmus, Milena, Jagadish, Akshay K., Mathony, Marvin, Ludwig, Tobias, Schulz, Eric

arXiv.org Artificial IntelligenceFeb-2-2025

Computational cognitive models, which formalize theories of cognition, enable researchers to quantify cognitive processes and arbitrate between competing theories by fitting models to behavioral data. Traditionally, these models are handcrafted, which requires significant domain knowledge, coding expertise, and time investment. Previous work has demonstrated that Large Language Models (LLMs) are adept at pattern recognition in-context, solving complex problems, and generating executable code. In this work, we leverage these abilities to explore the potential of LLMs in automating the generation of cognitive models based on behavioral data. We evaluated the LLM in two different tasks: model identification (relating data to a source model), and model generation (generating the underlying cognitive model). We performed these tasks across two cognitive domains - decision making and learning. In the case of data simulated from canonical cognitive models, we found that the LLM successfully identified and generated the ground truth model. In the case of human data, where behavioral noise and lack of knowledge of the true underlying process pose significant challenges, the LLM generated models that are identical or close to the winning model from cognitive science literature. Our findings suggest that LLMs can have a transformative impact on cognitive modeling. With this project, we aim to contribute to an ongoing effort of automating scientific discovery in cognitive science.

large language model, natural language, simulation of human behavior, (20 more...)

arXiv.org Artificial Intelligence

2502.00879

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (1.00)

Add feedback

Centaur: a foundation model of human cognition

Binz, Marcel, Akata, Elif, Bethge, Matthias, Brändle, Franziska, Callaway, Fred, Coda-Forno, Julian, Dayan, Peter, Demircan, Can, Eckstein, Maria K., Éltető, Noémi, Griffiths, Thomas L., Haridi, Susanne, Jagadish, Akshay K., Ji-An, Li, Kipnis, Alexander, Kumar, Sreejan, Ludwig, Tobias, Mathony, Marvin, Mattar, Marcelo, Modirshanechi, Alireza, Nath, Surabhi S., Peterson, Joshua C., Rmus, Milena, Russek, Evan M., Saanum, Tankred, Scharfenberg, Natalia, Schubert, Johannes A., Buschoff, Luca M. Schulze, Singhi, Nishad, Sui, Xin, Thalmann, Mirko, Theis, Fabian, Truong, Vuong, Udandarao, Vishaal, Voudouris, Konstantinos, Wilson, Robert, Witte, Kristin, Wu, Shuchen, Wulff, Dirk, Xiong, Huadong, Schulz, Eric

arXiv.org Artificial IntelligenceNov-18-2024

Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models.

foundation model, natural language, simulation of human behavior, (3 more...)

arXiv.org Artificial Intelligence

2410.20268

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.53)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.53)

Add feedback

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

Wu, Shuchen, Thalmann, Mirko, Dayan, Peter, Akata, Zeynep, Schulz, Eric

arXiv.org Artificial IntelligenceOct-27-2024

Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM's sequence likelihood correlates with human recall times. In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans. From HVM's adjustable layer of abstraction, we demonstrate that the model realizes a precise trade-off between compression and generalization. Our work offers a cognitive model that captures the learning and transfer of abstract representations in human cognition and differentiates itself from the behavior of large language models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.21332

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

Next state prediction gives rise to entangled, yet compositional representations of objects

Saanum, Tankred, Buschoff, Luca M. Schulze, Dayan, Peter, Schulz, Eric

arXiv.org Artificial IntelligenceOct-7-2024

A BSTRACT Compositional representations are thought to enable humans to generalize across combinatorially vast state spaces. Models with learnable object slots, which encode information about objects in separate latent codes, have shown promise for this type of generalization but rely on strong architectural priors. Models with distributed representations, on the other hand, use overlapping, potentially entangled neural codes, and their ability to support compositional generalization remains underexplored. In this paper we examine whether distributed models can develop linearly separable representations of objects, like slotted models, through unsupervised training on videos of object interactions. We show that, surprisingly, models with distributed representations often match or outperform models with object slots in downstream prediction tasks. Furthermore, we find that linearly separable object representations can emerge without object-centric priors, with auxiliary objectives like next-state prediction playing a key role. Finally, we observe that distributed models' object representations are never fully disentangled, even if they are linearly separable: Multiple objects can be encoded through partially overlapping neural populations while still being highly separable with a linear classifier. We hypothesize that maintaining partially shared codes enables distributed models to better compress object dynamics, potentially enhancing generalization. 1 I NTRODUCTION Humans naturally decompose scenes, events and processes in terms of the objects that feature in them (Tenenbaum et al., 2011; Lake et al., 2017). These object-centric construals have been argued to explain humans' ability to reason and generalize successfully (Goodman et al., 2008; Lake et al., 2015; Schulze Buschoff et al., 2023). It has therefore long been a chief aim in machine learning research to design models and agents that learn to represent the world compositionally, e.g. in terms of the building blocks that compose it.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2410.0494

Country: Europe > Germany (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Demircan, Can, Saanum, Tankred, Jagadish, Akshay K., Binz, Marcel, Schulz, Eric

arXiv.org Artificial IntelligenceOct-2-2024

In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs). However, as LLMs' in-context learning abilities continue to improve, understanding this phenomenon mechanistically becomes increasingly important. In particular, it is not well-understood how LLMs learn to solve specific classes of problems, such as reinforcement learning (RL) problems, in-context. Through three different tasks, we first show that Llama $3$ $70$B can solve simple RL problems in-context. We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors. Notably, these representations emerge despite the model only being trained to predict the next token. We verify that these representations are indeed causally involved in the computation of TD errors and $Q$-values by performing carefully designed interventions on them. Taken together, our work establishes a methodology for studying and manipulating in-context learning with SAEs, paving the way for a more mechanistic understanding.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.0128

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

$\texttt{metabench}$ -- A Sparse Benchmark to Measure General Ability in Large Language Models

Kipnis, Alex, Voudouris, Konstantinos, Buschoff, Luca M. Schulze, Schulz, Eric

arXiv.org Artificial IntelligenceJul-4-2024

Large Language Models (LLMs) vary in their abilities on a range of tasks. Initiatives such as the $\texttt{Open LLM Leaderboard}$ aim to quantify these differences with several large benchmarks (sets of test items to which an LLM can respond either correctly or incorrectly). However, high correlations within and between benchmark scores suggest that (1) there exists a small set of common underlying abilities that these benchmarks measure, and (2) items tap into redundant information and the benchmarks may thus be considerably compressed. We use data from $n > 5000$ LLMs to identify the most informative items of six benchmarks, ARC, GSM8K, HellaSwag, MMLU, TruthfulQA and WinoGrande (with $d=28,632$ items in total). From them we distill a sparse benchmark, $\texttt{metabench}$, that has less than $3\%$ of the original size of all six benchmarks combined. This new sparse benchmark goes beyond point scores by yielding estimators of the underlying benchmark-specific abilities. We show that these estimators (1) can be used to reconstruct each original $\textit{individual}$ benchmark score with, on average, $1.5\%$ root mean square error (RMSE), (2) reconstruct the original $\textit{total}$ score with $0.8\%$ RMSE, and (3) have a single underlying common factor whose Spearman correlation with the total score is $r = 0.93$.

benchmark, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.12844

Country:

North America > United States > Illinois (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

CogBench: a large language model walks into a psychology lab

Coda-Forno, Julian, Binz, Marcel, Wang, Jane X., Schulz, Eric

arXiv.org Artificial IntelligenceFeb-28-2024

Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenotyping LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse dataset. We analyze this data using statistical multilevel modeling techniques, accounting for the nested dependencies among fine-tuned versions of specific LLMs. Our study highlights the crucial role of model size and reinforcement learning from human feedback (RLHF) in improving performance and aligning with human behavior. Interestingly, we find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior. Finally, we explore the effects of prompt-engineering techniques. We discover that chain-of-thought prompting improves probabilistic reasoning, while take-a-step-back prompting fosters model-based behaviors.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.18225

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

In-context learning agents are asymmetric belief updaters

Schubert, Johannes A., Jagadish, Akshay K., Binz, Marcel, Schulz, Eric

arXiv.org Artificial IntelligenceFeb-6-2024

We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how in-context learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.03969

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback