AITopics | dsa

DSAS: AUniversal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

Neural Information Processing SystemsJun-23-2026, 04:35:46 GMT

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the "lost-in-themiddle" issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.67)
North America > Canada (0.46)
Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Film (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

Neural Information Processing SystemsJun-14-2026, 08:16:41 GMT

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the ''lost-in-the-middle'' issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

6ac807c9b296964409b277369e55621a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 11:35:24 GMT

artificial intelligence, machine learning, matrix, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete

Neural Information Processing SystemsFeb-13-2026, 11:35:20 GMT

Y et in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Appendix of A Deep Learning Dataloader with Shared Data Preparation

Neural Information Processing SystemsFeb-9-2026, 16:05:34 GMT

In this part, we show the I/O speed in the synchronous and asynchronous cases. Figure 3a show the I/O speed for four jobs that start at different moments. Then we further compare the RefCnt with the generic cache policy in the above cases. D = sample ([0, 13333], 10000) means sample a subset D with 10000 of size from [0, 13333] uniformly at random 36th Conference on Neural Information Processing Systems (NeurIPS 2022). DSA can always get the minimum misses.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Dynamically Scaled Activation Steering

Ferrando, Alex, Suau, Xavier, Gonzàlez, Jordi, Rodriguez, Pau

arXiv.org Artificial IntelligenceDec-4-2025

Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply interventions uniformly across all inputs, degrading model performance when steering is unnecessary. We introduce Dynamically Scaled Activation Steering (DSAS), a method-agnostic steering framework that decouples when to steer from how to steer. DSAS adaptively modulates the strength of existing steering transformations across layers and inputs, intervening strongly only when undesired behavior is detected. At generation time, DSAS computes context-dependent scaling factors that selectively adjust the strength of any steering method. We also show how DSAS can be jointly optimized end-to-end together with the steering function. When combined with existing steering methods, DSAS consistently improves the Pareto front with respect to steering alone, achieving a better trade-off between toxicity mitigation and utility preservation. We further demonstrate DSAS's generality by applying it to a text-to-image diffusion model, showing how adaptive steering allows the modulation of specific concepts. Finally, DSAS introduces minimal computational overhead while improving interpretability, pinpointing which tokens require steering and by how much.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.03661

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Media (0.67)
Transportation > Ground (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

Li, Jiakai, Wang, Rongzheng, Ma, Yizhuo, Liang, Shuang, Luo, Guangchun, Qin, Ke

arXiv.org Artificial IntelligenceOct-15-2025

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the ''lost-in-the-middle'' issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules. (i) The Contextual Gate Weighting (CGW) module alleviates ''lost-in-the-middle'' by assessing paragraph relevance through layer-wise attention tracking and position-aware weighting. (ii) The Reciprocal Attention Suppression (RAS) module enhances focus on critical paragraphs by suppressing information exchange between key and irrelevant texts, thus mitigating the limitations in long-range dependency modeling. Notably, DSAS functions as a plug-and-play solution requiring no architectural modifications or extra training parameters. Extensive experiments on four benchmarks demonstrate DSAS's efficacy across mainstream LLMs (Llama, Qwen, Mistral, and Deepseek), with an average F1-score improvement of 4.2% in Multi-doc QA tasks on Llama-3.1-8B-Instruct and Qwen2.5-14B-Instruct. Ablation studies confirm the essential contributions of both the CGW and RAS modules. In addition, detailed discussions in the Appendix further validate the robustness and scalability of DSAS.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.12251

Country:

Asia (1.00)
North America > United States (0.46)
North America > Canada (0.28)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models Lujun Li

Neural Information Processing SystemsOct-10-2025, 22:49:35 GMT

In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models (LLMs). LLMs have become increasingly powerful, but their large parameter counts make them computationally expensive. Existing pruning methods for compressing LLMs primarily focus on evaluating redundancies and removing element-wise weights. However, these methods fail to allocate adaptive layer-wise sparsities, leading to performance degradation in challenging tasks.

allocation function, language model, opération, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary Information 5 Dynamical Similarity We consider the problem of determining when two linear autonomous dynamical systems on R n x = Ax and y = By (9)

Neural Information Processing SystemsOct-8-2025, 20:37:38 GMT

Unfortunately, this measure of similarity requires solving a nonconvex optimization problem. All elements of O (n) /SO ( n) are bijective maps between itself and SO (n) . Then we can show by the same logic that PD O (n) /SO ( n). According to Williams et al. [2021], only two things are needed. Then we must show that the similarity transform is an isometry: g (T (A), T(B)) = g ( A, B) (17) where T ( A) is a map that in our case is the similarity transform. Here we demonstrate how V ector Fields transform.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

6ac807c9b296964409b277369e55621a-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 20:37:35 GMT

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Filters

Collaborating Authors

dsa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DSAS: AUniversal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

6ac807c9b296964409b277369e55621a-Supplemental-Conference.pdf

Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete

Appendix of A Deep Learning Dataloader with Shared Data Preparation

Dynamically Scaled Activation Steering

DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models Lujun Li

Supplementary Information 5 Dynamical Similarity We consider the problem of determining when two linear autonomous dynamical systems on R n x = Ax and y = By (9)

6ac807c9b296964409b277369e55621a-Paper-Conference.pdf