AITopics | computational cost

Collaborating Authors

computational cost

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-30-2026, 08:10:46 GMT

Modern state-space models (SSMs) often utilize structured transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost, even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with provably optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs.

artificial intelligence, proceedings, transition matrix, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback

Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization

Neural Information Processing SystemsJun-23-2026, 11:02:49 GMT

Natural gradient methods significantly accelerate the training of Physics-Informed Neural Networks (PINNs), but are often prohibitively costly. We introduce a suite of techniques to improve the accuracy and efficiency of energy natural gradient descent (ENGD) for PINNs. First, we leverage the Woodbury formula to dramatically reduce the computational complexity of ENGD. Second, we adapt the Subsampled Projected-Increment Natural Gradient Descent algorithm from the variational Monte Carlo literature to accelerate the convergence. Third, we explore the use of randomized algorithms to further reduce the computational cost in the case of large batch sizes. We find that randomization accelerates progress in the early stages of training for low-dimensional problems, and we identify key barriers to attaining acceleration in other scenarios. Our numerical experiments demonstrate that our methods outperform previous approaches, achieving the same L2 error as the original ENGD up to 75 faster.

artificial intelligence, engd, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models

Neural Information Processing SystemsJun-23-2026, 05:06:55 GMT

Vision-Language Models (VLMs) are integral to tasks such as image captioning and visual question answering, but their high computational cost, driven by large memory footprints and processing time, limits their scalability and real-time applicability. In this work, we propose leveraging Singular-Value Decomposition (SVD) over the joint query (Q), key (K), and value (V) weight matrices to reduce KV cache size and computational overhead. We in addition introduce an efficient rank allocation strategy that dynamically adjusts the SVD rank based on its impact on VLM accuracy, achieving a significant reduction in both memory usage and computational cost. Finally, we extend this approach by applying quantization to both VLM weights and activations, resulting in a highly efficient VLM.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Louisiana (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Scaling Up Active Testing to Large Language Models

Neural Information Processing SystemsJun-23-2026, 04:20:17 GMT

Active testing enables label-efficient evaluation of predictive models through careful data acquisition, but it can pose a significant computational cost. We identify cost-saving measures that enable active testing to be scaled up to large language models (LLMs). In particular we show that the surrogate model used to guide data acquisition can be constructed cheaply using in-context learning, does not require updating within an active-testing loop, and can be smaller than the target model. We even find we can make good data-acquisition decisions without making predictions with the target model. As a result we are able to achieve much more accurate evaluations of LLM performance relative to using randomly acquired data. We additionally introduce a bootstrap estimator of evaluation error, which we show to be a useful indicator of how well active testing is working within a single run.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

Neural Information Processing SystemsJun-23-2026, 03:37:43 GMT

This work introduces Structured Linear Controlled Differential Equations (SLiCEs), a unifying framework for sequence models with structured, input-dependent statetransition matrices that retain the maximal expressivity of dense matrices whilst being cheaper to compute. The framework encompasses existing architectures, such as input-dependent block-diagonal linear recurrent neural networks and DeltaNet's diagonal-plus-low-rank structure, as well as two novel variants based on sparsity and the Walsh-Hadamard transform. We prove that, unlike the diagonal statetransition matrices of S4D and Mamba, SLiCEs employing block-diagonal, sparse, or Walsh-Hadamard matrices match the maximal expressivity of dense matrices. Empirically, SLiCEs solve the A5 state-tracking benchmark with a single layer, achieve best-in-class length generalisation on regular language tasks among parallelin-time models, and match the performance of log neural controlled differential equations on six multivariate time-series classification datasets while cutting the average time per training step by a factor of twenty.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > Mexico (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ed45d6a03de84cc650cae0655f699356-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 01:48:10 GMT

T impro method o address vements, that this av g oids none ap, we fully hav p e ro generating addressed pose Self-T both all runcation N challenges samples Best-ofat and once. N eliminates (ST-BoN) the, a decoding need for re states of latenc w cost, ard y to models. ST by identify -BoN 50%.

large language model, machine learning, prime number, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Neural Information Processing SystemsJun-22-2026, 21:25:15 GMT

Existing molecular machine learning force fields (MLFFs) generally focus on the learning of atoms, molecules, and simple quantum chemical properties (such as energy and force), but ignore the importance of electron density (ED) ρ(r) in accurately understanding molecular force fields (MFFs). ED describes the probability of finding electrons at specific locations around atoms or molecules, which uniquely determines all ground state properties (such as energy, molecular structure, etc.) of interactive multi-particle systems according to the HohenbergKohn theorem. However, the calculation of ED relies on the time-consuming first-principles density functional theory (DFT), which leads to the lack of largescale ED data and limits its application in MLFFs. In this paper, we introduce EDBench, a large-scale, high-quality dataset of ED designed to advance learningbased research at the electronic scale. Built upon the PCQM4Mv2, EDBench provides accurate ED data, covering 3.3 million molecules. To comprehensively evaluate the ability of models to understand and utilize electronic information, we design a suite of ED-centric benchmark tasks spanning prediction, retrieval, and generation. Our evaluation of several state-of-the-art methods demonstrates that learning from EDBench is not only feasible but also achieves high accuracy. Moreover, we show that learning-based methods can efficiently calculate ED with comparable precision while significantly reducing the computational cost relative to traditional DFT calculations. All data and benchmarks from EDBench will be freely available, laying a robust foundation for ED-driven drug discovery and materials science.

bioinformatics, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

The Best Instruction-Tuning Data are Those That Fit

Neural Information Processing SystemsJun-22-2026, 19:55:18 GMT

High-quality supervised finetuning (SFT) data are essential for unlocking pretrained LLMs' capabilities. Typically, instructions are paired with responses from various sources by humans annotators or other LMs, which are often out of the distribution of the target model to be finetuned. This, at scale, can lead to diminishing returns and even hurt the models' performance and robustness. We hypothesize that SFT is most effective with data aligned to the model's pretrained distribution and propose GRAPE-- a novel SFT framework that tailors supervision to the target model. For each instruction, it gathers responses from various sources, and selects the one that aligns most closely to the target model's pretrained distribution, as measured by the normalized probability. We then proceed with standard SFT with these selected responses. We first evaluate GRAPE with a controlled experiment, where we sample various solutions for each question in UltraInteract from multiple models and finetune on GRAPE-selected data using LMs from different families including LLaMA.1-8B,

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia (0.92)
North America > United States > Illinois (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Neural Information Processing SystemsJun-22-2026, 18:23:03 GMT

Linear attention has attracted interest as a computationally efficient approximation to softmax attention, especially for long sequences. Recent studies have explored distilling softmax attention in pre-trained Transformers into linear attention. However, a critical challenge remains: how to choose the feature dimension that governs the approximation quality. Existing methods fix this dimension uniformly across all attention layers, overlooking the diverse roles and complexities of them. In this paper, we propose a principled method to automatically determine the feature dimension in linear attention using the concept of statistical degrees of freedom, which represent the effective dimensionality of the inputs. We provide a theoretical bound on the approximation error and show that the dimension chosen by our method achieves smaller errors under a fixed computational budget. Furthermore, we introduce an efficient layerwise training strategy to learn nonlinear features tailored to each layer. Experiments on multiple pre-trained transformers demonstrate that our method improves the performance of distilled models compared to baselines without increasing the inference cost. Our findings also provide insight into how the complexity of the attention mechanism evolves across layers.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MobileODE: An Extra Lightweight Network

Neural Information Processing SystemsJun-21-2026, 20:38:18 GMT

Depthwise-separable convolution has emerged as a significant milestone in the lightweight development of Convolutional Neural Networks (CNNs) over the past decade. This technique consists of two key components: depthwise convolution, which captures spatial information, and pointwise convolution, which enhances channel interactions. In this paper, we propose a novel method to lightweight CNNs through the discretization of Ordinary Differential Equations (ODEs). Specifically, we optimize depthwise-separable convolution by replacing the pointwise convolution with a discrete ODE module, termed the Channelwise ODESolver (COS). The COS module is constructed by a simple yet efficient direct differentiation Euler algorithm, using learnable increment parameters. This replacement reduces parameters by over 98.36% compared to conventional pointwise convolution. By integrating COS into MobileNet, we develop a new extra lightweight network called MobileODE. With carefully designed basic and inverse residual blocks, the resulting MobileODEV1 and MobileODEV2 reduce channel interaction parameters by 71.0% and 69.2%, respectively, compared to MobileNetV1, while achieving higher accuracy across various tasks, including image classification, object detection, and semantic segmentation.

convolution, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback