AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

bd96a50dfd2314e48787581840a07a1a-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-22-2026, 12:12:15 GMT

We use prompts to LLMs to act as language tools for two types of tasks in our work. The first being to798 read through and retrieve the relevant information from news articles to caption our image sequences,799 figures 6 and 7 The second being utilizing our captions to generate event specific question-answer800 pairs, figures 8 and 9.801 We conducted human validation on 144 events sampled across 15 disaster types to assess caption803 quality. Human evaluators were asked to classify each event as: (1) clear alignment between images,804 captions, and sources, (2) mismatch, or (3) inconclusive where imagery was insufficient to verify805 caption details. Overall results showed 65.3% clear alignment between images, captions, and sources,806 18.8% had mismatches, and 16.0% were inconclusive where imagery was insufficient to verify807 caption details. Excluding inconclusive cases, 77.7% of determinable events showed alignment,808 demonstrating reasonable caption quality for LLM-generated annotations.809

artificial intelligence, large language model, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Kansas (0.20)
North America > United States > Texas (0.17)
North America > United States > Louisiana (0.14)

Industry: Energy (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.45)

Add feedback

D: 2022-05-27 Ours: D MONITRS: Multimodal Observations of Natural Incidents Through Remote SensingC: 2022-06-06

Neural Information Processing SystemsJun-22-2026, 12:12:12 GMT

B: 2022-06-01Natural disasters cause devastating damage to communities and infrastructure everyyearareas.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Kansas (0.15)
North America > United States > Texas (0.15)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Energy (0.98)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models

Neural Information Processing SystemsJun-22-2026, 12:07:08 GMT

Large vision-language models (LVLMs), which integrate a vision encoder (VE) with a large language model, have achieved remarkable success across various tasks. However, there are still crucial challenges in LVLMs such as object hallucination, generating descriptions of objects that are not in the input image. Here, we argue that uncertain visual tokens within the VE is a key factor that contributes to object hallucination. Our statistical analysis found that there are positive correlations between visual tokens with high epistemic uncertainty and the occurrence of hallucinations. Furthermore, we show theoretically and empirically that visual tokens in early VE layers that exhibit large representation deviations under small adversarial perturbations indicate high epistemic uncertainty. Based on these findings, we propose a simple yet effective strategy to mitigate object hallucination by modifying the VE only. Our method comprises a proxy method with adversarial perturbations for identifying uncertain visual tokens efficiently and a method to mask these uncertain visual tokens during the self-attention process in the middle layers of the VE, suppressing their influence on visual encoding and thus alleviating hallucinations. Extensive experiments show that our method significantly reduces object hallucinations in LVLMs and can synergistically work with other prior arts.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation > Passenger (0.67)
Leisure & Entertainment (0.67)
Transportation > Ground > Road (0.46)
Transportation > Ground > Rail (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Trust, But Verify: ASelf-Verification Approach to Reinforcement Learning with Verifiable Rewards

Neural Information Processing SystemsJun-22-2026, 11:57:49 GMT

However, a prevalent issue is "superficial self-reflection", where models fail to robustly verify their own outputs. We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this. RISE explicitly and simultaneously trains an LLM to improve both its problemsolving and self-verification abilities within a single, integrated RL process. The core mechanism involves leveraging verifiable rewards from an outcome verifier to provide on-the-fly feedback for both solution generation and self-verification tasks. In each iteration, the model generates solutions, then critiques its own onpolicy generated solutions, with both trajectories contributing to the policy update. Extensive experiments on diverse mathematical reasoning benchmarks show that RISE consistently improves model's problem-solving accuracy while concurrently fostering strong self-verification skills. Our analyses highlight the advantages of online verification and the benefits of increased verification compute.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks

Neural Information Processing SystemsJun-22-2026, 11:48:12 GMT

Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.40)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

Analyzing Vulnerabilities of MoE Based LLMs via Stable Safety critical Expert Identification

Neural Information Processing SystemsJun-22-2026, 11:47:34 GMT

Large language models with Mixture-of-Experts (MoE) architectures achieve efficiency and scalability, yet their routing mechanisms introduce safety alignment challenges insufficiently addressed by techniques developed for dense models. In this work, the MoE-specific safety risk of positional vulnerability--that safetyaligned behaviors rely on specific expert modules--is formalized and systematically analyzed. An analytical framework, SAFEX, is presented to robustly identify, characterize, and validate safety-critical experts via a stability-based expert selection procedure, and to decompose them into two functional groups: the Harmful Content Detection Group (HCDG), which specializes in identifying and recognizing harmful content within user inputs, and the Harmful Response Control Group (HRCG), which specializes in controlling and enforcing model behaviors to generate appropriate safety responses. Expert-level interventions are conducted to probe causality and to test mitigation. Targeted masking of SAFEX-selected experts reveals that safety behavior is highly concentrated. On Qwen3-30B-A3B, configured with 48 MoE-FFN layers and 128 experts per layer under top-8 routing (48 128 = 6,144 experts in total), disabling 12 selected experts reduces the refusal rate by 22%. In addition, lightweight adaptation is performed using LoRA under three configurations--the HRCG, the union of HCDG and HRCG, and all experts--and the resulting updates are composed through negative weight merging targeted at the HRCG, leading to improved refusal under adversarial prompts without full-model retraining. These results establish positional vulnerability as a distinct MoE-specific safety challenge and provide a practical, computeefficient pathway for expert-level safety interventions within routed architectures (https://github.com/Bearisbug/SAFEx).

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.93)
Law (0.68)
Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Conformal Linguistic Calibration: Trading-off between Factuality and Specificity

Neural Information Processing SystemsJun-22-2026, 11:33:05 GMT

Language model outputs are not always reliable, thus prompting research into how to adapt model responses based on uncertainty. Common approaches include: abstention, where models refrain from generating responses when uncertain; and linguistic calibration, where models hedge their statements using uncertainty quantifiers. However, abstention can withhold valuable information, while linguistically calibrated responses are often challenging to leverage in downstream tasks. We propose a unified view, Conformal Linguistic Calibration (CLC), which reinterprets linguistic calibration as answer set prediction. First we present a framework connecting abstention and linguistic calibration through the lens of linguistic pragmatics. We then describe an implementation of CLC that allows for controlling the level of imprecision in model responses. Results demonstrate our method produces calibrated outputs with conformal guarantees on factual accuracy. Further, our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.1

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.93)

Industry:

Government (0.93)
Leisure & Entertainment > Sports > Soccer (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
(2 more...)

Add feedback

bcdaaa1aec3ae2aa39542acefdec4e4b-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 11:32:14 GMT

LRMs we By inno training directly vatively LRMs on fine-tune the to origina accurately the l reasoning model verify solely task the correctness the o reasoning verthinking.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Remarkable Robustness of LLMs: Stages of Inference?

Neural Information Processing SystemsJun-22-2026, 11:21:36 GMT

We investigate the robustness of Large Language Models (LLMs) to structural interventions by deleting and swapping adjacent layers during inference. Surprisingly, models retain 72-95\% of their original top-1 prediction accuracy without any fine-tuning. We find that performance degradation is not uniform across layers: interventions to the early and final layers cause the most degradation, while the model is remarkably robust to dropping middle layers. This pattern of localized sensitivity motivates our hypothesis of four stages of inference, observed across diverse model families and sizes: (1) detokenization, where local context is integrated to lift raw token embeddings into higher-level representations; (2) feature engineering, where task-and entity-specific features are iteratively refined; (3) prediction ensembling, where hidden states are aggregated into plausible next-token predictions; and (4) residual calibration, where irrelevant features are suppressed to finalize the output distribution. Synthesizing behavioral and mechanistic evidence, we provide a hypothesis for interpreting depth-dependent computations in LLMs.

artificial intelligence, large language model, natural language, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Neural Information Processing SystemsJun-22-2026, 11:13:04 GMT

Cinematography, the fundamental visual language of film, is essential for conveying narrative, emotion, and aesthetic quality. While recent Vision-Language Models (VLMs) demonstrate strong general visual understanding, their proficiency in comprehending the nuanced cinematic grammar embedded within individual shots remains largely unexplored and lacks robust evaluation.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: