AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

TOKENSWAP: ALightweight Method to Disrupt Memorized Sequences in LLMs

Neural Information Processing SystemsJun-19-2026, 11:58:05 GMT

As language models scale, their performance improves dramatically across a wide range of tasks, but so does their tendency to memorize and regurgitate parts of their training data verbatim. This tradeoff poses serious legal, ethical, and safety concerns, especially in real-world deployments. Existing mitigation techniques, such as differential privacy or model unlearning, often require retraining or access to internal weights making them impractical for most users. In this work, we introduce TOKENSWAP, a lightweight, post-hoc defense designed for realistic settings where the user can only access token-level outputs. Our key insight is that while large models are necessary for high task performance, small models (e.g., DistilGPT-2) are often sufficient to assign fluent, grammatically plausible probabilities to common function words - and crucially, they memorize far less. By selectively swapping token probabilities between models, TOKENSWAP preserves the capabilities of large models while reducing their propensity for verbatim reproduction.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Neural Information Processing SystemsJun-19-2026, 11:57:05 GMT

Sparse Autoencoders (SAEs) have recently gained attention as a means to improve the interpretability and steerability of Large Language Models (LLMs), both of which are essential for AI safety. In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity at the neuron-level in visual representations. To ensure that our evaluation aligns with human perception, we propose a benchmark derived from a large-scale user study. Our experimental results reveal that SAEs trained on VLMs significantly enhance the monosemanticity of individual neurons, with sparsity and wide latents being the most influential factors. Further, we demonstrate that applying SAE interventions on CLIP's vision encoder directly steers multimodal LLM outputs (e.g., LLaVA), without any modifications to the underlying language model. These findings emphasize the practicality and efficacy of SAEs as an unsupervised tool for enhancing both interpretability and control of VLMs.

large language model, natural language, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)

Add feedback

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUIAgents

Neural Information Processing SystemsJun-19-2026, 11:56:27 GMT

Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?

Neural Information Processing SystemsJun-19-2026, 11:48:37 GMT

Retrieval-Augmented Generation (RAG) based on knowledge graphs (KGs) enhances large language models (LLMs) with structural and textual external knowledge. Yet, existing KG-based RAG methods struggle to retrieve accurate and diverse information when handling complex queries. By modeling KG-based retrieval as a multi-step decision process, Process Reward Models (PRMs) offer a promising solution to align the retrieval behavior with the query-specific knowledge requirements. However, PRMs heavily rely on process-level supervision signals that are expensive and hard to obtain on KGs. To address this challenge, we propose GraphFlow, a framework that efficiently retrieves accurate and diverse knowledge required for complex queries from text-rich KGs. GraphFlow employs a detailed balance objective with local exploration to jointly optimize a retrieval policy and a flow estimator.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Neural Information Processing SystemsJun-19-2026, 11:39:45 GMT

Generalization in robot manipulation is essential for deploying robots in open-world environments and advancing toward artificial general intelligence. While recent vision-language-action models leverage large pre-trained understanding models for perception and instruction following, their ability to generalize to novel tasks, objects, and settings remains limited. In this work, we present VideoVLA, a simple approach that explores the potential of directly transforming large video generation models into robotic VLA manipulators. Given a language instruction and an image, VideoVLA predicts an action sequence as well as the future visual outcomes. Built on a multi-modal Diffusion Transformer, VideoVLA jointly models video, language, and action modalities, using pre-trained video generative models for joint visual and action forecasting. Our experiments show that high-quality imagined futures correlate with reliable action predictions and task success, highlighting the importance of visual imagination in manipulation. VideoVLA demonstrates strong generalization, including imitating other embodiments' skills and handling novel objects. This dual-prediction strategy--forecasting both actions and their visual consequences--explores a paradigm shift in robot learning and unlocks generalization capabilities in manipulation systems.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

894403f9604374a7a003063e480f65b9-Paper-Conference.pdf

Neural Information Processing SystemsJun-19-2026, 11:31:29 GMT

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of retrieval and copying tasks inspired by Liu et al. [2024a]. We use a recently proposed framework for studying length generalization [Huang et al., 2025] to provide guarantees for each of our settings.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Austria (0.28)
North America > United States (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Neural Information Processing SystemsJun-19-2026, 11:23:05 GMT

The new paradigm of test-time scaling has yielded remarkable breakthroughs in Large Language Models (LLMs) (e.g.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AION-1: Omnimodal Foundation Model for Astronomical Sciences

Neural Information Processing SystemsJun-19-2026, 11:22:32 GMT

While foundation models have shown promise across a variety of fields, astronomy lacks a unified framework for joint modeling across its highly diverse data modalities.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > United States > California (0.28)
North America > Canada > Ontario > Toronto (0.27)
Europe > United Kingdom > England (0.27)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Add feedback

Probing Large Language Models for Embodied Navigation

Neural Information Processing SystemsJun-19-2026, 11:21:26 GMT

Multimodal Large Language Models (MLLMs) have demonstrated strong generalization in vision-language tasks, yet their ability to understand and act within embodied environments remains underexplored. We present NavBench, a benchmark to evaluate the embodied navigation capabilities of MLLMs under zero-shot settings. NavBench consists of two components: (1) navigation comprehension, assessed through three cognitively grounded tasks including global instruction alignment, temporal progress estimation, and local observation-action reasoning, covering 3,200 question-answer pairs; and (2) step-by-step execution in 432 episodes across 72 indoor scenes, stratified by spatial, cognitive, and execution complexity. To support real-world deployment, we introduce a pipeline that converts MLLMs' outputs into robotic actions. We evaluate both proprietary and open-source models, finding that GPT-4o performs well across tasks, while lighter open-source models succeed in simpler cases. Results also show that models with higher comprehension scores tend to achieve better execution performance. Providing map-based context improves decision accuracy, especially in medium-difficulty scenarios. However, most models struggle with temporal understanding, particularly in estimating progress during navigation, which may pose a key challenge.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

ALittle Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Neural Information Processing SystemsJun-19-2026, 11:12:46 GMT

Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational depth is bounded. However, prior work treats the depth as a constant, leaving it unclear to what degree bounded depth may suffice for solving problems over short inputs, or how increasing the transformer's depth affects its expressive power. We address these questions by analyzing transformers whose depth can grow minimally with context length n. We show even highly uniform transformers with depth Θ(logn) can express two important problems: recognizing regular languages, which captures state tracking abilities and was known to be expressible only by an unconventional, non-uniform model of transformers, and graph connectivity, which underlies multistep reasoning. Notably, both of these problems cannot be expressed by fixed-depth transformers under standard complexity conjectures, demonstrating the expressivity benefit of growing depth. Moreover, our theory quantitatively predicts how depth must grow with input length to express these problems, showing that depth scaling is more efficient than scaling width or chain-of-thought steps. Empirically, our detailed experiments designed to bridge the expressivity vs. learnability gap reveal that our theoretical depth requirements for regular language recognition closely match the practical depth requirements for successfully training transformers. Thus, our results clarify how depth affects a transformer's reasoning capabilities, and provide practical guidance for effective depth selection for sequential reasoning.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback