AITopics

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-17-2026, 10:08:08 GMT

ad885a9caafff30ee9cafdf0ee42fda2-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (21 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Software (0.67)

Neural Information Processing SystemsOct-10-2025, 13:05:03 GMT

Optimistic Verifiable Training by Controlling Hardware Nondeterminism

We are currently in the "large-scale era" of machine learning (ML), where the exciting capabilities of

auditor, computation, trainer, (17 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsOct-9-2025, 03:35:11 GMT

A Unified Fast Gradient Clipping Framework for DP-SGD

A well-known numerical bottleneck in the differentially-private stochastic gradient descent (DP-SGD) algorithm is the computation of the gradient norm for each example in a large input batch. When the loss function in DP-SGD consists of an intermediate linear operation, existing methods in the literature have proposed decompositions of gradients that are amenable to fast norm computations. In this paper, we present a framework that generalizes the above approach to arbitrary (possibly nonlinear) intermediate operations. Moreover, we show that for certain operations, such as fully-connected and embedding layer computations, further improvements to the runtime and storage costs of existing decompositions can be deduced using certain components of our framework. Finally, preliminary numerical experiments are given to demonstrate the substantial effects of the aforementioned improvements.

artificial intelligence, gradient, machine learning, (16 more...)

Country: North America > United States (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Hu, Jingzhi, Li, Geoffrey Ye

Distillation-Enabled Knowledge Alignment Protocol for Semantic Communication in AI Agent Networks

arXiv.org Artificial IntelligenceSep-29-2025

Abstract--Future networks are envisioned to connect massive artificial intelligence (AI) agents, enabling their extensive collaboration on diverse tasks. Compared to traditional entities, these agents naturally suit the semantic communication (SC), which can significantly enhance the bandwidth efficiency. Nevertheless, SC requires the knowledge among agents to be aligned, while agents have distinct expert knowledge for their individual tasks in practice. In this paper, we propose a distillation-enabled knowledge alignment protocol (DeKAP), which distills the expert knowledge of each agent into parameter-efficient low-rank matrices, allocates them across the network, and allows agents to simultaneously maintain aligned knowledge for multiple tasks. We formulate the joint minimization of alignment loss, communication overhead, and storage cost as a large-scale integer linear programming problem and develop a highly efficient greedy algorithm. From computer simulation, the DeKAP establishes knowledge alignment with the lowest communication and computation resources compared to conventional approaches. Future communication networks will usher in a new era of the "Internet of Intelligence," where human beings, devices, and a wide range of artificial intelligence (AI) agents are seamlessly interconnected [1].

artificial intelligence, expert knowledge, knowledge, (17 more...)

doi: 10.1109/LCOMM.2025.3601995

2505.1703

Country:

Europe (1.00)
South America > Brazil (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

arXiv.org Artificial IntelligenceMar-31-2025

Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement

Tan, Yuqiao, He, Shizhu, Liao, Huanxuan, Zhao, Jun, Liu, Kang

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving relevant documents from external sources and incorporating them into the context. While it improves reliability by providing factual texts, it significantly increases inference costs as context length grows and introduces challenging issue of RAG hallucination, primarily caused by the lack of corresponding parametric knowledge in LLMs. An efficient solution is to enhance the knowledge of LLMs at test-time. Parametric RAG (PRAG) addresses this by embedding document into LLMs parameters to perform test-time knowledge enhancement, effectively reducing inference costs through offline training. However, its high training and storage costs, along with limited generalization ability, significantly restrict its practical adoption. To address these challenges, we propose Dynamic Parametric RAG (DyPRAG), a novel framework that leverages a lightweight parameter translator model to efficiently convert documents into parametric knowledge. DyPRAG not only reduces inference, training, and storage costs but also dynamically generates parametric knowledge, seamlessly enhancing the knowledge of LLMs and resolving knowledge conflicts in a plug-and-play manner at test-time. Extensive experiments on multiple datasets demonstrate the effectiveness and generalization capabilities of DyPRAG, offering a powerful and practical RAG paradigm which enables superior knowledge fusion and mitigates RAG hallucination in real-world applications. Our code is available at https://github.com/Trae1ounG/DyPRAG.

large language model, machine learning, natural language, (18 more...)

2503.23895

Country:

Asia > China > Beijing > Beijing (0.04)
North America > Canada (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.67)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-13-2025

FlexQuant: Elastic Quantization Framework for Locally Hosted LLM on Edge Devices

Chai, Yuji, Kwen, Mujin, Brooks, David, Wei, Gu-Yeon

Deploying LLMs on edge devices presents serious technical challenges. Memory elasticity is crucial for edge devices with unified memory, where memory is shared and fluctuates dynamically. Existing solutions suffer from either poor transition granularity or high storage costs. We propose FlexQuant, a novel elasticity framework that generates an ensemble of quantized models, providing an elastic hosting solution with 15x granularity improvement and 10x storage reduction compared to SoTA methods. FlexQuant works with most quantization methods and creates a family of trade-off options under various storage limits through our pruning method. It brings great performance and flexibility to the edge deployment of LLMs.

large language model, machine learning, natural language, (20 more...)

2501.07139

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Warmerdam, Alfonso Taboada, Caron, Mathilde, Asano, Yuki M.

Self-Masking Networks for Unsupervised Adaptation

arXiv.org Artificial IntelligenceSep-11-2024

With the advent of billion-parameter foundation models, efficient fine-tuning has become increasingly important for the adaptation of models to downstream tasks. However, especially in computer vision, it can be hard to achieve good performance when access to quality labeled data is lacking. In this work, we propose a method adapting pretrained generalist models in a self-supervised manner by learning binary masks. These self-supervised masking networks (SMNs) are up to 79x more efficient to store and significantly improve performance on label-efficient downstream tasks. We validate the usefulness of learning binary masks as a fine-tuning method on 8 datasets and 3 model architectures, and we demonstrate the effectiveness of SMNs in 3 label-efficient settings.

experiment, international conference, self-masking network, (16 more...)

2409.07577

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-31-2024

LCQ: Low-Rank Codebook based Quantization for Large Language Models

Cai, Wen-Pu, Li, Wu-Jun

Large language models (LLMs) have recently demonstrated promising performance in many tasks. However, the high storage and computational cost of LLMs has become a challenge for deploying LLMs. Weight quantization has been widely used for model compression, which can reduce both storage and computational cost. Most existing weight quantization methods for LLMs use a rank-one codebook for quantization, which results in substantial accuracy loss when the compression ratio is high. In this paper, we propose a novel weight quantization method, called low-rank codebook based quantization (LCQ), for LLMs. LCQ adopts a low-rank codebook, the rank of which can be larger than one, for quantization. Experiments show that LCQ can achieve better accuracy than existing methods with a negligibly extra storage cost.

codebook, post-training quantization, quantization, (16 more...)

2405.20973

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)