AITopics | Country

Collaborating Authors

Country

From Noise to Narrative: Tracing the Origins of Hallucinations in Transformers

Neural Information Processing SystemsJun-15-2026, 13:52:21 GMT

As generative AI systems become competent and democratized in science, business, and government, deeper insight into their failure modes now poses an acute need. The occasional volatility in their behavior, such as the propensity of transformer models to hallucinate, impedes trust and adoption of emerging AI solutions in high-stakes areas. In the present work, we establish how and when hallucinations arise in pre-trained transformer models through concept representations captured by sparse autoencoders, under scenarios with experimentally controlled uncertainty in the input space. Our systematic experiments reveal that the number of semantic concepts used by the transformer model grows as the input information becomes increasingly unstructured. In the face of growing uncertainty in the input space, the transformer model becomes prone to activate coherent yet input-insensitive semantic features, leading to hallucinated output. At its extreme, for pure-noise inputs, we identify a wide variety of robustly triggered and meaningful concepts in the intermediate activations of pre-trained transformer models, whose functional integrity we confirm through targeted steering. We also show that hallucinations in the output of a transformer model can be reliably predicted from the concept patterns embedded in transformer layer activations. This collection of insights on transformer internal processing mechanics has immediate consequences for aligning AI models with human values, AI safety, opening the attack surface for potential adversarial attacks, and providing a basis for automatic quantification of a model's hallucination risk.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
Europe > United Kingdom (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

What does the US-Iran deal mean for Lebanon and Israel?

BBC NewsJun-15-2026, 13:50:39 GMT

Watch: What does the US-Iran deal to end war mean for Lebanon and Israel? A deal has been agreed between the US and Iran to end the war they are in. The deal includes an end to military operations in Lebanon, but Israel says it forces will remain in the country indefinitely. As some Beirut residents attempt to return to their homes after previously fleeing, the BBC's international editor Jeremy Bowen takes a look at what the agreed deal might mean for those involved. Travelling with a humanitarian convoy, BBC's Hugo Bachega has been given rare access to a part of Lebanon under Israeli occupation.

artificial intelligence, football 2026, home news football 2026, (11 more...)

BBC News

Country:

Asia > Middle East > Iran (1.00)
Asia > Middle East > Israel (0.96)
Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.26)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > Asia Government > Middle East Government (0.50)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

207be3da143f1043336627c5d25aae50-Paper-Conference.pdf

Neural Information Processing SystemsJun-15-2026, 13:47:14 GMT

Multi-modal Large Language Models (LLM) have advanced conversational abilities but struggle with providing live, interactive step-by-step guidance, a key capability for future AI assistants. Effective guidance requires not only delivering instructions but also detecting their successful execution, as well as identifying and alerting users to mistakes, all of which has to happen in real-time. This requires models that are not turn-based, but that can react asynchronously to a video stream, as well as video data showing users performing tasks including mistakes and their corrections. To this end, we introduce Qualcomm Interactive Cooking, a new benchmark and dataset built upon CaptainCook4D, which contains user mistakes during task execution. Our dataset and benchmark features densely annotated, timed instructions and feedback messages, specifically including mistake alerts precisely timestamped to their visual occurrence in the video. We evaluate state-ofthe-art multi-modal LLMs on the Qualcomm Interactive Cooking benchmark and introduce LIVEMAMBA, a streaming multi-modal LLM designed for interactive instructional guidance. This work provides the first dedicated benchmark and a strong baseline for developing and evaluating on live, situated coaching.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

HubGT: Fast Graph Transformer with Decoupled Hierarchy Labeling

Neural Information Processing SystemsJun-15-2026, 13:46:55 GMT

Graph Transformer (GT) leveraging the powerful Transformer architecture to learn graph-structured data. However, effectively representing graph information while ensuring efficiency remains challenging, as our analysis reveals that graph-scale operations still constitute the computational bottleneck in current GT designs and limit their applications to large graphs. In this work, we tackle the GT scalability issue by proposing HubGT, which is boosted by decoupled graph computation and hierarchical graph representations. HubGT represents graph information with a novel hub labeling scheme, which encompasses enriched neighborhoods for node token generation, and fast computation for distance-based positional encoding. Notably, the precomputation and training of HubGT achieve complexities linear to the number of graph edges and nodes, respectively, while the training stage completely removes graph-related computations, leading to favorable mini-batch capability and GPU utilization. Extensive experiments demonstrate that HubGT offers efficient computation and mini-batch capability over existing GT designs on large-scale datasets while achieving top-tier effectiveness. Our code is available at: https://github.com/gdmnl/HubGT.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.92)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Multi-Scale Finetuning for Encoder-based Time Series Foundation Models

Neural Information Processing SystemsJun-15-2026, 13:45:54 GMT

Time series foundation models (TSFMs) demonstrate impressive zero-shot performance for time series forecasting. However, an important yet underexplored challenge is how to effectively finetune TSFMs on specific downstream tasks. While naive finetuning can yield performance gains, we argue that it falls short of fully leveraging TSFMs' capabilities, often resulting in overfitting and suboptimal performance. Given the diverse temporal patterns across sampling scales and the inherent multi-scale forecasting capabilities of TSFMs, we adopt a causal perspective to analyze finetuning process, through which we highlight the critical importance of explicitly modeling multiple scales and reveal the shortcomings of naive approaches. Focusing on encoder-based TSFMs, we propose MultiScale FineTuning (MSFT), a simple yet general framework that explicitly integrates multi-scale modeling into the finetuning process. Experimental results on three different backbones (MOIRAI, MOMENT and UNITS) demonstrate that TSFMs finetuned with MSFT not only outperform naive and typical parameter efficient finetuning methods but also surpass state-of-the-art deep learning methods. Codes are available at https://github.com/zqiao11/MSFT.

data mining, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Approximating Shapley Explanations in Reinforcement Learning

Neural Information Processing SystemsJun-15-2026, 13:45:40 GMT

Reinforcement learning has achieved remarkable success in complex decisionmaking environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle for their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigourous interpretability in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Parallelizing MCMCAcross the Sequence Length

Neural Information Processing SystemsJun-15-2026, 13:37:36 GMT

Markov chain Monte Carlo (MCMC) methods are foundational algorithms for Bayesian inference and probabilistic modeling. However, most MCMC algorithms are inherently sequential and their time complexity scales linearly with the sequence length. Previous work on adapting MCMC to modern hardware has therefore focused on running many independent chains in parallel. Here, we take an alternative approach: we propose algorithms to evaluate MCMC samplers in parallel across the chain length. To do this, we build on recent methods for parallel evaluation of nonlinear recursions that formulate the state sequence as a solution to a fixed-point problem and solve for the fixed-point using a parallel form of Newton's method. We show how this approach can be used to parallelize Gibbs, Metropolis-adjusted Langevin, and Hamiltonian Monte Carlo sampling across the sequence length. In several examples, we demonstrate the simulation of up to hundreds of thousands of MCMC samples with only tens of parallel Newton iterations. Additionally, we develop two new parallel quasi-Newton methods to evaluate nonlinear recursions with lower memory costs and reduced runtime. We find that the proposed parallel algorithms accelerate MCMC sampling across multiple examples, in some cases by more than an order of magnitude compared to sequential evaluation.

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Add feedback

High-Performance Arithmetic Circuit Optimization via Differentiable Architecture Search

Neural Information Processing SystemsJun-15-2026, 13:37:18 GMT

Arithmetic circuit optimization remains a fundamental challenge in modern integrated circuit design. Recent advances have cast this problem within the Learning to Optimize (L2O) paradigm, where intelligent agents autonomously explore high-performance design spaces with encouraging results. However, existing approaches predominantly target coarse-grained architectural configurations, while the crucial interconnect optimization stage is often relegated to oversimplified proxy models or a heuristic approach. This disconnect undermines design quality, leading to suboptimal solutions in the circuit topology search space. To bridge this gap, we present ARITH-DAS, a Differentiable Architecture Search framework for Arithmetic circuits. To the best of our knowledge, ARITH-DAS is the first to formulate interconnect optimization within arithmetic circuits as a differentiable edge prediction problem over a multi-relational directed acyclic graph, enabling fine-grained, proxy-free optimization at the interconnection level. We evaluate ARITH-DAS on a suite of representative arithmetic circuits, including multipliers and multiply-accumulate units. Experiments show substantial improvements over state-of-the-art L2O and conventional methods, achieving up to 27.05% gain in hypervolume of area-delay Pareto frontiers, a standard metric for evaluating multi-objective optimization performance.

artificial intelligence, impr, optimization problem, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Enhanced Expert Merging for Mixture-of-Experts in Graph Foundation Models

Neural Information Processing SystemsJun-15-2026, 13:36:14 GMT

Graph foundation models (GFMs) have emerged as a promising paradigm for learning transferable knowledge across diverse graph-structured data. The inherent heterogeneity in features and graph structures poses significant challenges for building scalable and generalizable GFMs. Existing research has employed mixture-of-experts (MoE) models to handle the challenges, assigning the most suitable expert to each graph. Despite this, the underlying mechanisms of MoE within the context of GFMs remain insufficiently explored. In this work, we conduct an in-depth experimental study on an MoE-based GFM and uncover an intriguing finding: the experts ranked second and third assigned by the router perform better than the top-ranked expert.

data mining, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Information Technology (0.47)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation

Neural Information Processing SystemsJun-15-2026, 13:36:00 GMT

Consistency learning with feature perturbation is a widely used strategy in semisupervised medical image segmentation. However, many existing perturbation methods rely on dropout, and thus require a careful manual tuning of the dropout rate, which is a sensitive hyperparameter and often difficult to optimize and may lead to suboptimal regularization. To overcome this limitation, we propose VQ-Seg, the first approach to employ vector quantization (VQ) to discretize the feature space and introduce a novel and controllable Quantized Perturbation Module (QPM) that replaces dropout.

artificial intelligence, machine learning, segmentation, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: