AITopics | Neural Information Processing Systems

Collaborating Authors

Neural Information Processing Systems

Stepping Forward on the Last Mile Chen Feng Qualcomm AI Research

Neural Information Processing SystemsMay-31-2025, 23:08:31 GMT

Continuously adapting pre-trained models to local data on resource constrained edge devices is the last mile for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e.g., NPUs, DSPs, MCUs, etc.) are designed as fixed-point inference accelerators, without training capabilities. Forward gradients, solely based on directional derivatives computed from two forward calls, have been recently used for model training, with substantial savings in computation and memory. However, the performance of quantized training with fixed-point forward gradients remains unclear. In this paper, we investigate the feasibility of ondevice training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. An empirical study on how training with forward gradients navigates in the loss landscape is further explored. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Energy > Oil & Gas (0.47)
Telecommunications (0.42)
Semiconductors & Electronics (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

7716d0fc31636914783865d34f6cdfd5-AuthorFeedback.pdf

Neural Information Processing SystemsMay-31-2025, 23:07:51 GMT

We appreciate reviewers' valuable comments. We will correct typos and reply to comments in the following. Our analysis can be extended to more general cases. We will add more discussions in the revision. Extension to SGD: Our analysis can be extended to mini-batch SGD when the batch size is large.

artificial intelligence, machine learning, resnet, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.57)

Add feedback

Neural Networks with Cheap Differential Operators

Tian Qi Chen, David K. Duvenaud

Neural Information Processing SystemsMay-31-2025, 23:07:16 GMT

Gradients of neural networks can be computed efficiently for any architecture, but some applications require differential operators with higher time complexity. We describe a family of restricted neural network architectures that allow efficient computation of a family of differential operators involving dimension-wise derivatives, used in cases such as computing the divergence. Our proposed architecture has a Jacobian matrix composed of diagonal and hollow (non-diagonal) components. We can then modify the backward computation graph to extract dimension-wise derivatives efficiently with automatic differentiation. We demonstrate these cheap differential operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker-Planck equation for training stochastic differential equation models.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

770f8e448d07586afbf77bb59f698587-AuthorFeedback.pdf

Neural Information Processing SystemsMay-31-2025, 23:07:01 GMT

Our main contribution lies in reducing the computational cost by a factor of d.

artificial intelligence, dataset, reviewer, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Online Composite Optimization Between Stochastic and Adversarial Environments

Neural Information Processing SystemsMay-31-2025, 23:06:46 GMT

We study online composite optimization under the Stochastically Extended Adversarial (SEA) model. Specifically, each loss function consists of two parts: a fixed non-smooth and convex regularizer, and a time-varying function which can be chosen either stochastically, adversarially, or in a manner that interpolates between the two extremes.

artificial intelligence, machine learning, optimization, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Soft-Label Integration for Robust Toxicity Classification Xian Wu Northwestern University Northwestern University Evanston, USA Shuo Han Northwestern University Northwestern University Evanston, USA

Neural Information Processing SystemsMay-31-2025, 23:06:29 GMT

This paper contains uncensored toxic content that might be offensive. Toxicity classification in textual content remains a significant problem.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country: North America > United States (0.86)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Government (0.67)

Technology:

Information Technology > Security & Privacy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Communications > Social Media (0.93)
(3 more...)

Add feedback

Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness Mohammad Mahdi Khalili

Neural Information Processing SystemsMay-31-2025, 22:57:41 GMT

Machine Learning (ML) models trained on data from multiple demographic groups can inherit representation disparity [7] that may exist in the data: the model may be less favorable to groups contributing less to the training process; this in turn can degrade population retention in these groups over time, and exacerbate representation disparity in the long run. In this study, we seek to understand the interplay between ML decisions and the underlying group representation, how they evolve in a sequential framework, and how the use of fairness criteria plays a role in this process. We show that the representation disparity can easily worsen over time under a natural user dynamics (arrival and departure) model when decisions are made based on a commonly used objective and fairness criteria, resulting in some groups diminishing entirely from the sample pool in the long run. It highlights the fact that fairness criteria have to be defined while taking into consideration the impact of decisions on user dynamics. Toward this end, we explain how a proper fairness criterion can be selected based on a general user dynamics model.

artificial intelligence, machine learning, representation disparity, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

ac4106bcfff33140de7799d03daeb8a4-Paper-Conference.pdf

Neural Information Processing SystemsMay-31-2025, 22:57:16 GMT

Cross-Validation (CV) is the default choice for estimate the out-of-sample performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that, in terms of estimating the out-of-sample performances, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, K-fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases leave-one-out again does not outperform plug-in once this variability is taken into account. We obtain our theoretical comparisons via a novel higher-order Taylor analysis that dissects the limit theorems of testing evaluations, which applies to model classes that are not amenable to previously known sufficient conditions. Our numerical results demonstrate that plug-in performs indeed no worse than CV in estimating model performance across a wide range of examples.

artificial intelligence, machine learning, nonparametric model, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Banking & Finance (0.67)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Neural Information Processing SystemsMay-31-2025, 22:56:57 GMT

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of Õ(d HK) when K is sufficiently large and near-optimal policy switching cost of Õ(dH), with d being the eluder dimension of the function class, H being the planning horizon, and K being the number of episodes.

inequality hold, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.54)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

Add feedback

Pre-training Differentially Private Models with Limited Public Data

Neural Information Processing SystemsMay-31-2025, 22:55:38 GMT

While differential privacy (DP) is a prominent method to gauge the degree of security provided to the models, its application is commonly limited to the model fine-tuning stage, due to the performance degradation when DP is applied during the pretraining stage. Consequently, DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training process. In this work, we provide a theoretical understanding of the efficacy of DP training by analyzing the per-iteration loss improvement, through the lens of Hessian matrix for large neural networks. We make a key observation that DP optimizers' performance degradation can be significantly mitigated by the use of limited public data, which leads to a novel DP continual pre-training strategy. Empirically, using only 10% of public data and 90% of private data, our strategy can achieve DP accuracy of 41.5% on ImageNet-21k (with ϵ = 8), as well as non-DP accuracy of 55.7% and 60.0% on downstream tasks Places365 and iNaturalist-2021, respectively, on par with state-of-the-art standard pretraining and substantially outperforming existing DP pre-trained models. Our DP pre-trained models are released in fastDP library (https://github.com/

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: