AITopics | Education

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tu-ples ( e.g., pairs or triplets) of observations, rather than over individual observations. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the large-scale setting, gradient estimates should be obtained by sampling tuples of data points with replacement ( incomplete U -statistics) instead of sampling data points without replacement ( complete U -statistics based on subsamples). We develop a theoretical framework accounting for the substantial impact of this strategy on the generalization ability of the prediction model returned by the Stochastic Gradient Descent (SGD) algorithm. It reveals that the method we promote achieves a much better trade-off between statistical accuracy and computational cost. Beyond the rate bound analysis, experiments on AUC maximization and metric learning provide strong empirical evidence of the superiority of the proposed approach.

artificial intelligence, machine learning, variance, (15 more...)

Neural Information Processing Systems

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Neural Information Processing SystemsOct-2-2025, 06:58:20 GMT

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, communication round, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Industry:

Education (0.95)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Neural Information Processing SystemsOct-2-2025, 06:58:12 GMT

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry:

Education (0.96)
Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Ofer Dekel, Ronen Eldan, Tomer Koren

Neural Information Processing SystemsOct-2-2025, 06:47:01 GMT

Bandit convex optimization is one of the fundamental problems in the field of online learning.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

A Appendix

Neural Information Processing SystemsOct-2-2025, 06:11:56 GMT

We first give a derivation on the equivalence of label smoothing regularization and Eq. 7. Evidently, the objective does not regularize confidence diversity. "Scale both" corresponds to the originally proposed distillation objective in which both teacher and Plots of test accuracy and ECE against amount of temperature scaling applied are shown in Figure 1. Firstly, we observe that models trained with student scaling have ECE almost identical to that of the teacher models. As a direct contrast, we see that the student models trained without student scaling perform much better in terms of calibration error in general over its teacher. This coupled effect could be the reason for the observed conflict between ECE and accuracy.

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

1731592aca5fb4d789c4119c65c10b4b-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 06:11:49 GMT

artificial intelligence, machine learning, prediction, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter

Neural Information Processing SystemsOct-2-2025, 05:30:57 GMT

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Convergence rates of sub-sampled Newton methods

Murat A. Erdogdu, Andrea Montanari

Neural Information Processing SystemsOct-2-2025, 05:02:32 GMT

In this regime, algorithms which utilize sub-sampling techniques are known to be effective. In this paper, we use sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost. The proposed algorithm is robust in terms of starting point and step size, and enjoys a composite convergence rate, namely, quadratic convergence at start and linear convergence when the iterate is close to the minimizer. We develop its theoretical analysis which also allows us to select near-optimal algorithm parameters. Our theoretical results can be used to obtain convergence rates of previously proposed sub-sampling based algorithms as well. We demonstrate how our results apply to well-known machine learning problems. Lastly, we evaluate the performance of our algorithm on several datasets under various scenarios.

algorithm, convergence rate, newsamp, (15 more...)

Neural Information Processing Systems

Country: