AITopics | memory usage

Collaborating Authors

memory usage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-head Temporal Latent Attention

Neural Information Processing SystemsJun-23-2026, 01:57:06 GMT

While Transformer self-attention offers strong parallelism, the Key-Value (KV) cache grows linearly with sequence length and becomes a bottleneck for inference efficiency. Multi-head latent attention was recently developed to compress the KV cache into a low-rank latent space. This paper proposes Multi-head Temporal Latent Attention (MTLA), which further reduces the KV cache size along the temporal dimension, greatly lowering the memory footprint of self-attention inference. MTLA employs a hyper-network to dynamically merge temporally adjacent KV cache vectors. To address the mismatch between the compressed KV cache and processed sequence lengths, a stride-aware causal mask is proposed to ensure efficient parallel training and consistency with inference behaviour. Experiments across tasks, including speech translation, speech recognition, speech understanding and text summarisation, demonstrate that MTLA achieves competitive performance compared to standard Multi-Head Attention (MHA), while greatly improving inference speed and GPU memory usage. For example, on a English-German speech translation task, MTLA achieves a 5.3 speedup and a reduction in GPU memory usage by a factor of 8.3 compared to MHA, while maintaining translation quality.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota (0.28)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLMReasoning

Neural Information Processing SystemsJun-22-2026, 19:12:49 GMT

Recent reasoning-focused language models achieve high accuracy by generating lengthy intermediate reasoning paths before producing final answers. While this approach is effective in solving problems that require logical thinking, long reasoning paths significantly increase memory usage and reduce throughput of token generation, limiting the practical deployment of such models. We propose Reasoning Path Compression (RPC), a training-free method that accelerates inference by leveraging the semantic sparsity of reasoning paths. RPC periodically compresses the KV cache by retaining cache entries that receive high importance score, which are computed using a selector window composed of recently generated queries. Experiments show that RPC improves generation throughput of QwQ-32B by up to 1.60 compared to the inference with full KV cache, with an accuracy drop of 1.2% on the AIME 2024 benchmark. Our findings demonstrate that semantic sparsity in reasoning traces can be effectively exploited for compression, offering a practical path toward efficient deployment of reasoning LLMs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

InfiniPot-V: Memory-Constrained KVCache Compression for Streaming Video Understanding

Neural Information Processing SystemsJun-22-2026, 18:40:24 GMT

Modern multimodal large language models (MLLMs) can reason over hour-long video, yet their key-value (KV) cache grows linearly with time--quickly exceeding the fixed memory of phones, AR glasses, and edge robots. Prior compression schemes either assume the whole video and user query are available offline or must first build the full cache, so memory still scales with stream length. InfiniPot-V is the first training-free, query-agnostic framework that enforces a hard, lengthindependent memory cap for streaming video understanding. During video encoding it monitors the cache and, once a user-set threshold is reached, runs a lightweight compression pass that (i) removes temporally redundant tokens via Temporal-axis Redundancy (TaR) metric and (ii) keeps semantically significant tokens via Value-Norm (VaN) ranking. Across four open-source MLLMs and four long-video and streaming-video benchmarks, InfiniPot-V cuts peak GPU memory by up to 94%, sustains real-time generation, and matches or surpasses full-cache accuracy--even in multi-turn dialogues. By dissolving the KV cache bottleneck without retraining or query knowledge, InfiniPot-V closes the gap for on-device streaming video assistants.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Memory-Efficient Training with In-Place FFT Implementation

Neural Information Processing SystemsJun-21-2026, 22:03:27 GMT

Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation.

data quality, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Data Science > Data Quality (0.87)

Add feedback

More Than Just Functional: LLM-as-a-Critique for Efficient Code Generation

Neural Information Processing SystemsJun-20-2026, 14:31:47 GMT

Large language models (LLMs) have demonstrated remarkable progress in generating functional code, leading to numerous AI-based coding assistant tools. However, their reliance on the perplexity objective during both training and inference primarily emphasizes functionality, often at the expense of efficiency--an essential consideration for real-world coding tasks. Interestingly, we observed that well-trained LLMs inherently possess knowledge about code efficiency, but this potential remains underutilized with standard decoding approaches. To address this, we design strategic prompts to activate the model's embedded efficiency understanding, effectively using LLMs as efficiency critiques to guide code generation toward higher efficiency without sacrificing--and sometimes even improving--functionality, all without the need for costly real code execution. Extensive experiments on benchmark datasets (EffiBench, HumanEval+, COFFE, Mercury) across multiple representative code models demonstrate up to a 70.6% reduction in average execution time and a 13.6% decrease in maximum memory usage, highlighting the computational efficiency and practicality of our approach compared to existing alternatives.

efficiency, large language model, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

59ea33ae3d096f3bcd5026b479710cf8-Paper-Conference.pdf

Neural Information Processing SystemsJun-17-2026, 12:24:21 GMT

Recent rehearsal-free continual learning (CL) methods guided by prompts achieve strong performance on vision tasks with non-stationary data but remain resourceintensive, hindering real-world edge deployment. We introduce resource-efficient prompting (REP), which improves the computational and memory efficiency of prompt-based rehearsal-free continual learning methods while minimizing accuracy trade-offs. Our approach employs swift prompt selection to refine input data using a carefully provisioned model and introduces adaptive token merging (AToM) and adaptive layer dropping (ALD) for efficient prompt updates. AToM and ALD selectively skip data and model layers while preserving task-specific features during the learning of new tasks. Extensive experiments on multiple image classification datasets demonstrate REP's superior resource efficiency over state-of-the-art rehearsal-free CL methods.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Education (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scalable Signature Kernel Computations via Local Neumann Series Expansions

Neural Information Processing SystemsJun-16-2026, 21:19:59 GMT

The signature kernel [10] is a recent state-of-the-art tool for analyzing highdimensional sequential data, valued for its theoretical guarantees and strong empirical performance. In this paper, we present a novel method for efficiently computing the signature kernel of long, high-dimensional time series via adaptively truncated recursive local power series expansions. Building on the characterization of the signature kernel as the solution of a Goursat PDE [17], our approach employs tilewise Neumann-series expansions to derive rapidly converging power series approximations of the signature kernel that are locally defined on subdomains and propagated iteratively across the entire domain of the Goursat solution by exploiting the geometry of the time series. Algorithmically, this involves solving a system of interdependent Goursat PDEs via adaptively truncated local power series expansions and recursive propagation of boundary conditions along a directed graph in a topological ordering.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Private Training Large-scale Models with Efficient DP-SGD

Neural Information Processing SystemsJun-15-2026, 18:58:14 GMT

As large language models (LLMs) increasingly underpin technological advancements, the privacy of their training data emerges as a critical concern. Differential Privacy (DP) serves as a rigorous mechanism to protect this data, yet its integration via Differentially Private Stochastic Gradient Descent (DP-SGD) introduces substantial challenges, primarily due to the complexities of per-sample gradient clipping.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

1 2 a t2) v0 = q v2x + v2y + a t v0x = v0cos (θ0) v0y = v0sin (θ0). (2)

Neural Information Processing SystemsApr-25-2026, 09:20:37 GMT

Define an agent's current state information as s = (x,y,θ,vx,vy), which includes the x,y positions in the coordinate space, and the yaw angle θ, and the velocities in the X and Y directions. The inverse kinematics can be used to calculate the actions for behavior cloning purpose. With the Bicycle action space, we propose a model to approximate the vehicle dynamics with the goal of minimizing the discrepancy between the predicted vehicle states and the recorded vehicle states. More specifically, define the vehicle's coordinates as x,y in the global coordinate system, and the predicted coordinates as ˆx,ˆy, the goal is to minimize (x ˆx)2 + (y ˆy)2. Define the current vehicle's state information as s, which includes the coordinates of the vehicle in the global coordinate system (x,y), the vehicle's yaw angle θ, the vehicle's speed in the x and y direction vx,vy.

artificial intelligence, machine learning, runtime, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Filters

Collaborating Authors

memory usage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Multi-head Temporal Latent Attention

Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLMReasoning

InfiniPot-V: Memory-Constrained KVCache Compression for Streaming Video Understanding

Memory-Efficient Training with In-Place FFT Implementation

More Than Just Functional: LLM-as-a-Critique for Efficient Code Generation

59ea33ae3d096f3bcd5026b479710cf8-Paper-Conference.pdf

Scalable Signature Kernel Computations via Local Neumann Series Expansions

Private Training Large-scale Models with Efficient DP-SGD

1 2 a t2) v0 = q v2x + v2y + a t v0x = v0cos (θ0) v0y = v0sin (θ0). (2)

1395b425d06a50e42fafe91cf04f3a98-Supplemental-Conference.pdf