Goto

Collaborating Authors

 dsa




Appendix of A Deep Learning Dataloader with Shared Data Preparation

Neural Information Processing Systems

In this part, we show the I/O speed in the synchronous and asynchronous cases. Figure 3a show the I/O speed for four jobs that start at different moments. Then we further compare the RefCnt with the generic cache policy in the above cases. D = sample ([0, 13333], 10000) means sample a subset D with 10000 of size from [0, 13333] uniformly at random 36th Conference on Neural Information Processing Systems (NeurIPS 2022). DSA can always get the minimum misses.


Dynamically Scaled Activation Steering

arXiv.org Artificial Intelligence

Activation steering has emerged as a powerful method for guiding the behavior of generative models towards desired outcomes such as toxicity mitigation. However, most existing methods apply interventions uniformly across all inputs, degrading model performance when steering is unnecessary. We introduce Dynamically Scaled Activation Steering (DSAS), a method-agnostic steering framework that decouples when to steer from how to steer. DSAS adaptively modulates the strength of existing steering transformations across layers and inputs, intervening strongly only when undesired behavior is detected. At generation time, DSAS computes context-dependent scaling factors that selectively adjust the strength of any steering method. We also show how DSAS can be jointly optimized end-to-end together with the steering function. When combined with existing steering methods, DSAS consistently improves the Pareto front with respect to steering alone, achieving a better trade-off between toxicity mitigation and utility preservation. We further demonstrate DSAS's generality by applying it to a text-to-image diffusion model, showing how adaptive steering allows the modulation of specific concepts. Finally, DSAS introduces minimal computational overhead while improving interpretability, pinpointing which tokens require steering and by how much.


DSAS: A Universal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

arXiv.org Artificial Intelligence

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the ''lost-in-the-middle'' issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules. (i) The Contextual Gate Weighting (CGW) module alleviates ''lost-in-the-middle'' by assessing paragraph relevance through layer-wise attention tracking and position-aware weighting. (ii) The Reciprocal Attention Suppression (RAS) module enhances focus on critical paragraphs by suppressing information exchange between key and irrelevant texts, thus mitigating the limitations in long-range dependency modeling. Notably, DSAS functions as a plug-and-play solution requiring no architectural modifications or extra training parameters. Extensive experiments on four benchmarks demonstrate DSAS's efficacy across mainstream LLMs (Llama, Qwen, Mistral, and Deepseek), with an average F1-score improvement of 4.2% in Multi-doc QA tasks on Llama-3.1-8B-Instruct and Qwen2.5-14B-Instruct. Ablation studies confirm the essential contributions of both the CGW and RAS modules. In addition, detailed discussions in the Appendix further validate the robustness and scalability of DSAS.


Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models Lujun Li

Neural Information Processing Systems

In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models (LLMs). LLMs have become increasingly powerful, but their large parameter counts make them computationally expensive. Existing pruning methods for compressing LLMs primarily focus on evaluating redundancies and removing element-wise weights. However, these methods fail to allocate adaptive layer-wise sparsities, leading to performance degradation in challenging tasks.


Supplementary Information 5 Dynamical Similarity We consider the problem of determining when two linear autonomous dynamical systems on R n x = Ax and y = By (9)

Neural Information Processing Systems

Unfortunately, this measure of similarity requires solving a nonconvex optimization problem. All elements of O (n) /SO ( n) are bijective maps between itself and SO (n) . Then we can show by the same logic that PD O (n) /SO ( n). According to Williams et al. [2021], only two things are needed. Then we must show that the similarity transform is an isometry: g (T (A), T(B)) = g ( A, B) (17) where T ( A) is a map that in our case is the similarity transform. Here we demonstrate how V ector Fields transform.



Communication Bias in Large Language Models: A Regulatory Perspective

arXiv.org Artificial Intelligence

Large language models (LLMs) are a prominent subset of AI, built on advanced neural network architectures that can generate new data, including text, images, and audio. LLMs utilize various technologies to identify patterns in a given set of training data, without requiring explicit instructions about what to look for [ 12, 35 ] . LLMs typically assume that the training data follows a probability distribution, and once they have identified existing patterns, they can generate new instances that are similar to the original data. By drawing from and combining training data, LLMs can create new content that tran scends the initial dataset [1 7 ].