AITopics

Genre: Research Report (0.47)

Technology:

Information Technology > Information Management > Search (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Neural Information Processing SystemsFeb-9-2026, 07:25:16 GMT

6f46dd176364ccec308c2760189a4605-Paper.pdf

alignment, computational linguistic, permutation, (12 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-25-2025

VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking

Yang, Kichang, Kim, Seonjun, Kim, Minjae, Zhang, Nairan, Zhang, Chi, Lee, Youngki

Edge deployment of large Vision-Language Models (VLMs) increasingly relies on flash-based weight offloading, where activation sparsification is used to reduce I/O overhead. However, conventional sparsification remains model-centric, selecting neurons solely by activation magnitude and neglecting how access patterns influence flash performance. We present Neuron Chunking, an I/O-efficient sparsification strategy that operates on chunks (i.e., groups of contiguous neurons in memory) and couples neuron importance with storage access cost. The method models I/O latency through a lightweight abstraction of access contiguity and selects chunks with high utility, defined as neuron importance normalized by estimated latency. By aligning sparsification decisions with the underlying storage behavior, Neuron Chunking improves I/O efficiency by up to 4.65x and 5.76x on Jetson Orin Nano and Jetson AGX Orin, respectively.

large language model, latency, machine learning, (18 more...)

2511.18692

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsNov-20-2025, 18:03:19 GMT

TETRIS: TilE-matching the TRemendous Irregular Sparsity

Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, Yuan Xie

Compressing neural networks by pruning weights with small magnitudes can significantly reduce the computation and storage cost. Although pruning makes the model smaller, it is difficult to get a practical speedup in modern computing platforms such as CPU and GPU due to the irregularity.

machine learning, natural language, sparsity, (20 more...)

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Raimondi, Bianca, Gabbrielli, Maurizio

Exploiting Primacy Effect To Improve Large Language Models

arXiv.org Artificial IntelligenceOct-23-2025

Large Language Models (LLMs) have become essential in many Natural Language Processing (NLP) tasks, leveraging extensive pre-training and fine-tuning to achieve high accuracy. However, like humans, LLMs exhibit biases, particularly positional biases such as primacy and recency effects, which can influence the accuracy of the answers. The primacy effect-where items presented first are more likely to be remembered or selected-plays a key role in Multiple Choice Question Answering (MCQA), where the order of answer options can affect prediction outcomes. This study focuses on primacy bias in fine-tuned LLMs: We first show that fine-tuning amplifies this bias, probably due to exposure to human-like patterns. Hence, we strategically leverage this effect by reordering response options based on semantic similarity to the query, without requiring knowledge of the correct answer. Our experimental results show that this approach significantly improves performance in MCQA. More generally, our findings underscore the dual nature of biases as both challenges and opportunities, offering insights for bias-aware model design and NLP applications.

accuracy, large language model, machine learning, (20 more...)

doi: 10.26615/978-954-452-098-4-113

2507.13949

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

arXiv.org Artificial IntelligenceOct-10-2025

Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering

Hong, Ke, Li, Xiuhong, Liu, Minxu, Mao, Qiuli, Wu, Tianqi, Huang, Zixiao, Chen, Lufang, Wang, Zhong, Zhang, Yichong, Zhu, Zhenhua, Dai, Guohao, Wang, Yu

Generative models have achieved remarkable success across various applications, driving the demand for multi-GPU computing. Inter-GPU communication becomes a bottleneck in multi-GPU computing systems, particularly on consumer-grade GPUs. By exploiting concurrent hardware execution, overlapping computation and communication latency becomes an effective technique for mitigating the communication overhead. We identify that an efficient and adaptable overlapping design should satisfy (1) tile-wise overlapping to maximize the overlapping opportunity, (2) interference-free computation to maintain the original computational performance, and (3) communication agnosticism to reduce the development burden against varying communication primitives. Nevertheless, current designs fail to simultaneously optimize for all of those features. To address the issue, we propose FlashOverlap, which utilizes a novel signaling mechanism: when part of the output finishes, the computation kernel sends a signal to trigger the communication of that part, while continuing the computation of the remaining part (interference-free computation). Consequently, the communication of the finished part and the computation of the remaining part can be overlapped. On top of the signaling mechanism, FlashOverlap comprises two key components: (1) the determination of the signaling timing to boost the overlap efficiency (tile-wise overlapping), and (2) a pre-communication reordering to create the contiguous address for finished data, enabling communication by simply calling NCCL APIs (communication agnosticism), and a post-communication reordering to correct the data order. Experiments show that FlashOverlap achieves up to 1.65x speedup through overlap, outperforming existing works in most cases. Code is available at https://github.com/infinigence/FlashOverlap.

large language model, machine learning, natural language, (20 more...)

doi: 10.1145/3767295.3769370

2504.19519

Country:

Asia > China (0.28)
Europe > United Kingdom > Scotland (0.16)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(2 more...)

Neural Information Processing SystemsAug-19-2025, 21:38:10 GMT

fb44a668c2d4bc984e9d6ca261262cbb-Paper-Conference.pdf

data mining, machine learning, reordering, (21 more...)

Country:

North America > United States > Texas > Harris County > Houston (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
(2 more...)

Neural Information Processing SystemsAug-17-2025, 05:23:31 GMT

clarification and will make sure to improve on every aspect of our paper

We greatly appreciate the reviewers for the time and expertise they have invested in the reviews. For example, we showed in Section 5.2 that the performance of LaPerm responded monotonically w.r.t. We thank all the reviewers for the careful observations. We will revise the main text and expand the paper's references and appendix We will pursue them in future works.

artificial intelligence, machine learning, reviewer, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.72)

Neural Information Processing SystemsAug-15-2025, 02:19:59 GMT

6f46dd176364ccec308c2760189a4605-Paper.pdf

alignment, arxiv preprint arxiv, permutation, (12 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceAug-5-2025

Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model

Li, Yunfeng, Liu, Junhong, Yang, Zhaohui, Liao, Guofu, Zhang, Chuyun

Deep learning models have been widely adopted for False Data Injection Attack (FDIA) detection in smart grids due to their ability to capture unstructured and sparse features. However, the increasing system scale and data dimensionality introduce significant computational and memory burdens, particularly in large-scale industrial datasets, limiting detection efficiency. To address these issues, this paper proposes Rec-AD, a computationally efficient framework that integrates Tensor Train decomposition with the Deep Learning Recommendation Model (DLRM). Rec-AD enhances training and inference efficiency through embedding compression, optimized data access via index reordering, and a pipeline training mechanism that reduces memory communication overhead. Fully compatible with PyTorch, Rec-AD can be integrated into existing FDIA detection systems without code modifications. Experimental results show that Rec-AD significantly improves computational throughput and real-time detection performance, narrowing the attack window and increasing attacker cost. These advancements strengthen edge computing capabilities and scalability, providing robust technical support for smart grid security.

artificial intelligence, machine learning, rec-ad, (20 more...)

2507.14668

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (1.00)
Information Technology > Security & Privacy (0.93)
Energy > Renewable > Solar (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)