AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)
Information Technology > Architecture > Distributed Systems (0.59)

Neural Information Processing SystemsFeb-18-2026, 16:23:00 GMT

f1f9962f76581ce8bf38d04c6d6c96b1-Paper-Conference.pdf

amm, machine learning, natural language, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Neural Information Processing SystemsOct-10-2025, 21:17:27 GMT

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications.

amm, training strategy, transformer, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Khalaf, Malik, Shamshoum, Yara, Hodos, Nitzan, Sieradzki, Yuval, Schuster, Assaf

QKV Projections Require a Fraction of Their Memory

arXiv.org Artificial IntelligenceSep-30-2025

The Multi-Head Attention mechanism is central to LLM operation, and multiple works target its compute and memory efficiency during training. While most works focus on approximating the scaled dot product, the memory consumption of the linear projections that compute the $Q$, $K$, and $V$ tensors from the input $x$ is often overlooked. To address this, we propose Point-Approximate Matrix Multiplication (PAMM), a novel tensor compression technique that reduces memory consumption of the $Q,K,V$ projections in attention layers by a factor of up to $\times 512$, effectively erasing their memory footprint, while achieving similar or better final perplexity. PAMM is fully composable with efficient attention techniques such as FlashAttention, making it a practical and complementary method for memory-efficient LLM training.

amm, large language model, machine learning, (18 more...)

2506.02939

Country: North America > United States (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceSep-29-2025

TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting

Guo, Zhongbin, Wang, Yuhao, Jian, Ping, Li, Chengzhi, Chen, Xinyue, Yang, Zhen, E, Ertai

Temporal Change Description (TCD) and Future Satellite Image Forecasting (FSIF) are critical, yet historically disjointed tasks in Satellite Image Time Series (SITS) analysis. Both are fundamentally limited by the common challenge of modeling long-range temporal dynamics. To explore how to improve the performance of methods on both tasks simultaneously by enhancing long-range temporal understanding capabilities, we introduce TAMMs, the first unified framework designed to jointly perform TCD and FSIF within a single MLLM-diffusion architecture. TAMMs introduces two key innovations: Temporal Adaptation Modules (TAM) enhance frozen MLLM's ability to comprehend long-range dynamics, and Semantic-Fused Control Injection (SFCI) mechanism translates this change understanding into fine-grained generative control. This synergistic design makes the understanding from the TCD task to directly inform and improve the consistency of the FSIF task. Extensive experiments demonstrate TAMMs significantly outperforms state-of-the-art specialist baselines on both tasks.

artificial intelligence, machine learning, natural language, (18 more...)

2506.18862

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.48)
Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Neural Information Processing SystemsMay-27-2025, 04:49:00 GMT

LibAMM: Empirical Insights into Approximate Computing for Accelerating Matrix Multiplication

accelerating matrix multiplication, large language model, natural language, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)
Information Technology > Architecture > Distributed Systems (0.62)

arXiv.org Artificial IntelligenceFeb-26-2025

Optimal Approximate Matrix Multiplication over Sliding Windows

Yao, Ziqi, Chen, Mingsong, Chen, Cheng

We explore the problem of approximate matrix multiplication (AMM) within the sliding window model, where algorithms utilize limited space to perform large-scale matrix multiplication in a streaming manner. This model has garnered increasing attention in the fields of machine learning and data mining due to its ability to handle time sensitivity and reduce the impact of outdated data. However, despite recent advancements, determining the optimal space bound for this problem remains an open question. In this paper, we introduce the DS-COD algorithm for AMM over sliding windows. This novel and deterministic algorithm achieves optimal performance regarding the space-error tradeoff. We provide theoretical error bounds and the complexity analysis for the proposed algorithm, and establish the corresponding space lower bound for the AMM sliding window problem. Additionally, we present an adaptive version of DS-COD, termed aDS-COD, which improves computational efficiency and demonstrates superior empirical performance. Extensive experiments conducted on both synthetic and real-world datasets validate our theoretical findings and highlight the practical effectiveness of our methods.

algorithm, matrix, snapshot, (12 more...)

2502.1883

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining (0.88)

Neural Information Processing SystemsFeb-9-2025, 16:04:39 GMT

(Almost) No Label No Cry

Giorgio Patrini, Richard Nock, Tiberio Caetano, Paul Rivera

In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is minimally sufficient for the minimization of many proper scoring losses with linear (or kernelized) classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Then, we present an iterative learning algorithm that uses this as initialization. We ground this algorithm in Rademacher-style generalization bounds that fit the LLP setting, introducing a generalization of Rademacher complexity and a Label Proportion Complexity measure.

artificial intelligence, classifier, machine learning, (15 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Artificial IntelligenceOct-31-2024

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Chen, Xinwang, Liu, Ning, Zhu, Yichen, Feng, Feifei, Tang, Jian

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. The framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement. With lower FID, EDT-S, EDT-B, and EDT-XL attained speed-ups of 3.93x, 2.84x, and 1.92x respectively in the training phase, and 2.29x, 2.29x, and 2.22x respectively in inference, compared to the corresponding sizes of MDTv2. The source code is released here.

amm, down-sampling module, transformer, (14 more...)

2410.23788

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Agrawal, Akshaya, Mayer, Parker, Kingston, Zachary, Hollinger, Geoffrey A.

Constrained Nonlinear Kaczmarz Projection on Intersections of Manifolds for Coordinated Multi-Robot Mobile Manipulation

arXiv.org Artificial IntelligenceOct-28-2024

Cooperative manipulation tasks impose various structure-, task-, and robot-specific constraints on mobile manipulators. However, current methods struggle to model and solve these myriad constraints simultaneously. We propose a twofold solution: first, we model constraints as a family of manifolds amenable to simultaneous solving. Second, we introduce the constrained nonlinear Kaczmarz (cNKZ) projection technique to produce constraint-satisfying solutions. Experiments show that cNKZ dramatically outperforms baseline approaches, which cannot find solutions at all. We integrate cNKZ with a sampling-based motion planning algorithm to generate complex, coordinated motions for 3 to 6 mobile manipulators (18--36 DoF), with cNKZ solving up to 80 nonlinear constraints simultaneously and achieving up to a 92% success rate in cluttered environments. We also demonstrate our approach on hardware using three Turtlebot3 Waffle Pi robots with OpenMANIPULATOR-X arms.

artificial intelligence, constraint-based reasoning, manifold, (15 more...)

2410.2163

Country: North America > United States > Oregon (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.66)