AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

On the Robustness of Transformers against Context Hijacking for Linear Classification

Neural Information Processing SystemsJun-17-2026, 21:27:39 GMT

Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear transformers. In our setup, context tokens are designed as factually correct query-answer pairs, where the queries are similar to the final query but have opposite labels. Then, we develop a general theoretical analysis on the robustness of the linear transformers, which is formulated as a function of the model depth, training context lengths, and number of hijacking context tokens. A key finding is that a well-trained deeper transformer can achieve higher robustness, which aligns with empirical observations. We show that this improvement arises because deeper layers enable more fine-grained optimization steps, effectively mitigating interference from context hijacking. This is also well supported by our numerical and real-world experiments. Our findings provide theoretical insights into the benefits of deeper architectures and contribute to enhancing the understanding of transformer architectures.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (1.00)
Education (0.69)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AnomalyCoT: AMulti-Scenario Chain-of-Thought Dataset for Multimodal Large Language Models

Neural Information Processing SystemsJun-17-2026, 21:27:17 GMT

Industrial Anomaly Detection (IAD) is an indispensable quality control technology in modern production processes. Recently, on account of the outstanding visual comprehension and cross-domain knowledge transfer capabilities of Multimodal Large Language Models (MLLMs), existing studies have explored the application of MLLMs in the IAD domain and established some multimodal IAD datasets. However, although the latest datasets contain various fundamental IAD tasks, they formulate tasks in a general question-and-answer format lacking a rigorous reasoning process, and they are relatively limited in the diversity of scenarios, which restricts their reliability in practical applications. In this paper, we propose AnomalyCoT, a multimodal Chain-of-Thought (CoT) dataset for multi-scenario IAD tasks. It consists of 37,565 IAD samples with the CoT data and is defined by challenging composite IAD tasks. Meanwhile, the CoT data for each sample provides precise coordinates of anomaly regions, thereby improving visual comprehension of defects across different types. AnomalyCoT is constructed through a systematic pipeline and involves multiple manual operations. Based on AnomalyCoT, we conducted a comprehensive evaluation of various mainstream MLLMs and fine-tuned representative models in different ways. The final results show that Gemini-2.0flash

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.68)
Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLMTraining

Neural Information Processing SystemsJun-17-2026, 21:16:31 GMT

Training LLMs relies on distributed implementations using multiple GPUs to compute gradients in parallel with sharded optimizers. However, synchronizing gradients in data parallel setups introduces communication overhead that grows with the number of workers, limiting parallelization efficiency. Local optimization algorithms reduce communications but incur high memory costs as they prevent optimizer state sharding, hindering scalability. To address this, we propose ACcumulate while COmmunicate (ACCO), a memory-efficient optimization algorithm for distributed LLM training. By synchronizing delayed gradients while computing new ones, ACCO reduces GPU idle time and supports heterogeneous hardware. To mitigate the convergence issues caused by delayed updates, we introduce a novel technique ensuring training dynamics align with standard distributed optimization. Compared to ZeRO-1, our approach is significantly faster and scales effectively across heterogeneous hardware.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Neural Information Processing SystemsJun-17-2026, 21:15:12 GMT

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computationand communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

coreguard, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Visualization-of-Thought Jailbreak Attack against Large Visual Language Models

Neural Information Processing SystemsJun-17-2026, 21:14:54 GMT

As Visual Language Models (VLMs) continue to evolve, they have demonstrated increasingly sophisticated logical reasoning capabilities and multimodal thought generation, opening doors to widespread applications. However, this advancement raises serious concerns about content security, particularly when these models process complex multimodal inputs requiring intricate reasoning. When faced with these safety challenges, the critical competition between logical reasoning and safety objectives of VLMs is often overlooked in previous works. In this paper, we introduce Visualization-of-Thought Attack (VoTA), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Neural Information Processing SystemsJun-17-2026, 21:04:20 GMT

Chain-of-Thought (CoT) has widely enhanced mathematical reasoning in Large Language Models (LLMs), but it still remains challenging for extending it to multimodal domains. Existing works either adopt a similar textual reasoning for image input, or seek to interleave visual signals into mathematical CoT. However, they face three key limitations for math problem-solving: reliance on coarsegrained box-shaped image regions, limited perception of vision encoders on math content, and dependence on external capabilities for visual modification. In this paper, we propose MINT-CoT, introducing Mathematical INterleaved Tokens for Chain-of-Thought visual reasoning. MINT-CoT adaptively interleaves relevant visual tokens into textual reasoning steps via an Interleave Token, which dynamically selects visual regions of any shapes within math figures. To empower this capability, we construct the MINT-CoT dataset, containing 54K mathematical problems aligning each reasoning step with visual regions at the token level, accompanied by a rigorous data generation pipeline. We further present a threestage MINT-CoT training strategy, progressively combining text-only CoTSFT, interleaved CoTSFT, and interleaved CoTRL, which derives our MINT-CoT-7B model. Extensive experiments demonstrate the effectiveness of our method for effective visual interleaved reasoning in mathematical domains, where MINTCoT-7B outperforms the baseline model by +34.08% on MathVista, +28.78% on GeoQA, and +23.2% on MMStar, respectively.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Neural Information Processing SystemsJun-17-2026, 21:02:56 GMT

We introduce AUTOGRAPH, a scalable autoregressive model for attributed graph generation using decoder-only transformers. By flattening graphs into random sequences of tokens through a reversible process, AUTOGRAPH enables modeling graphs as sequences without relying on additional node features that are expensive to compute, in contrast to diffusion-based approaches.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

MLZero: AMulti-Agent System for End-to-end Machine Learning Automation

Neural Information Processing SystemsJun-17-2026, 21:02:35 GMT

Existing AutoML systems have advanced the automation of machine learning (ML); however, they still require substantial manual configuration and expert input, particularly when handling multimodal data. We introduce MLZero, a novel multi-agent framework powered by Large Language Models (LLMs) that enables end-to-end ML automation across diverse data modalities with minimal human intervention. A cognitive perception module is first employed, transforming raw multimodal inputs into perceptual context that effectively guides the subsequent workflow. To address key limitations of LLMs, such as hallucinated code generation and outdated API knowledge, we enhance the iterative code generation process with semantic and episodic memory. MLZero demonstrates superior performance on MLE-Bench Lite, outperforming all competitors in both success rate and solution quality, securing six gold medals. Additionally, when evaluated on our Multimodal AutoML Agent Benchmark, which includes 25 more challenging tasks spanning diverse data modalities, MLZero outperforms the competing methods by a large margin with a success rate of 0.92 (+263.6%)

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.27)
North America > Canada (0.27)
Asia > Japan (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Transportation > Air (1.00)
Information Technology (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Neural Information Processing SystemsJun-17-2026, 20:57:54 GMT

Large Multimodal Models (LMMs) have achieved impressive progress in visual perception and reasoning. However, when confronted with visually ambiguous or non-semantic scene text, they often struggle to accurately spot and understand the content, frequently generating semantically plausible yet visually incorrect answers, which we refer to as semantic hallucination. In this work, we investigate the underlying causes of semantic hallucination and identify a key finding: Transformer layers in LLM with stronger attention focus on scene text regions are less prone to producing semantic hallucinations. Thus, we propose a training-free semantic hallucination mitigation framework comprising two key components: (1) ZoomText, a coarse-to-fine strategy that identifies potential text regions without external detectors; and (2) Grounded Layer Correction, which adaptively leverages the internal representations from layers less prone to hallucination to guide decoding, correcting hallucinated outputs for non-semantic samples while preserving the semantics of meaningful ones. To enable rigorous evaluation, we introduce TextHalu-Bench, a benchmark of 1,740 samples spanning both semantic and nonsemantic cases, with manually curated question-answer pairs designed to probe model hallucinations. Extensive experiments demonstrate that our method not only effectively mitigates semantic hallucination but also achieves strong performance on public benchmarks for scene text spotting and understanding.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MMTU: AMassive Multi-Task Table Understanding and Reasoning Benchmark

Neural Information Processing SystemsJun-17-2026, 20:56:31 GMT

Tables and table-based use cases play a crucial role in many important real-world applications, such as spreadsheets, databases, and computational notebooks, which traditionally require expert-level users like data engineers, data analysts, and database administrators to operate. Although LLMs have shown remarkable progress in working with tables (e.g., in spreadsheet and database copilot scenarios), comprehensive benchmarking of such capabilities remains limited. In contrast to an extensive and growing list of NLP benchmarks, evaluations of table-related tasks are scarce, and narrowly focus on tasks like NL-to-SQL and Table-QA, overlooking the broader spectrum of real-world tasks that professional users face. This gap limits our understanding and model progress in this important area. In this work, we introduce MMTU, a large-scale benchmark with around 28K questions across 25 real-world table tasks, designed to comprehensively evaluate models ability to understand, reason, and manipulate real tables at the expertlevel. These tasks are drawn from decades' worth of computer science research on tabular data, with a focus on complex table tasks faced by professional users. We show that MMTU require a combination of skills - including table understanding, reasoning, and coding - that remain challenging for today's frontier models, where even frontier reasoning models like OpenAIGPT-5 and DeepSeek R1 score only around 69% and 57% respectively, suggesting significant room for improvement. We highlight key findings in our evaluation using MMTU and hope that this benchmark drives further advances in understanding and developing foundation models for structured data processing and analysis.

arxiv preprint arxiv, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback