AITopics | Large Language Model

Birth of a Transformer: AMemory Viewpoint

Neural Information Processing SystemsApr-24-2026, 07:53:30 GMT

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

01c4593d60a020fed5607944330106b1-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:37:22 GMT

arxiv preprint, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

00d1f03b87a401b1c7957e0cc785d0bc-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:36:03 GMT

T annotation of o tackle questions this of problem, and visual answers question-answer recent for methods videos, . In pretrained here particular build on on, a fr W promising ozen eb-scale bidir approach te ectional xt-only language adapts data to fr multi-modal ozen models autor (BiLM) egr inputs.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Passive learning of active causal strategies in agents and language models

Neural Information Processing SystemsApr-24-2026, 06:51:34 GMT

What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that, under certain assumptions, learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle.

large language model, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

034d7bfeace2a9a258648b16fc626298-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:24:03 GMT

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

029df12a9363313c3e41047844ecad94-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 05:58:35 GMT

information retrieval, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County (0.28)

Genre: Workflow (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

Detecting Any Human-Object Interaction Relationship: Universal HOIDetector with Spatial Prompt Learning on Foundation Models

Neural Information Processing SystemsApr-24-2026, 05:31:16 GMT

Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting < human,action,object >triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as UniHOI. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HOPrompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (i.e.

large language model, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows

Neural Information Processing SystemsApr-24-2026, 05:30:02 GMT

Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

China's DeepSeek unveils latest models a year after upending global tech

Al JazeeraApr-24-2026, 05:17:27 GMT

China's DeepSeek unveils latest models a year after upending global tech China's DeepSeek has unveiled the latest versions of its signature artificial intelligence-powered chatbot, a year after its flagship model sent shockwaves through the global tech scene. The Chinese start-up launched preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash on Friday as it touted its ability to go toe-to-toe with US rivals such as OpenAI and Google. The "flash" model has similar reasoning abilities to the "pro" version, while offering faster response times and more cost-effective pricing, the Hangzhou-based startup said. Like DeepSeek's previous chatbots, V4-Pro and V4-Flash follow an open-source model, meaning developers are free to use and modify them at will. The release comes after DeepSeek-R1 stunned the tech sector upon its launch in January last year with capabilities broadly comparable with those of ChatGPT and Gemini.

large language model, machine learning, natural language, (11 more...)

Al Jazeera

Country:

South America (0.42)
North America > Central America (0.42)
North America > Canada (0.42)
(15 more...)

Industry:

Government > Regional Government > North America Government > United States Government (0.34)
Information Technology > Security & Privacy (0.32)
Government > Military (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Neural Information Processing SystemsApr-24-2026, 04:17:19 GMT

We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology: