AITopics | Education

Meta-Learning Requires Meta-Augmentation

Neural Information Processing SystemsOct-2-2025, 17:56:50 GMT

Meta-learning algorithms aim to learn two components: a model that predicts targets for a task, and a base learner that updates that model when given examples from a new task. This additional level of learning can be powerful, but it also creates another potential source of overfitting, since we can now overfit in either the model or the base learner.

artificial intelligence, augmentation, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Cold Case: the Lost MNIST Digits

Neural Information Processing SystemsOct-2-2025, 17:43:00 GMT

Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc. We also reconstruct the complete MNIST test set with 60,000 samples instead of the usual 10,000. Since the balance 50,000 were never distributed, they can be used to investigate the impact of twenty-five years of MNIST experiments on the reported testing performances. Our limited results unambiguously confirm the trends observed by Recht et al. [2018, 2019]: although the misclassification rates are slightly off, classifier ordering and model selection remain broadly reliable. We attribute this phenomenon to the pairing benefits of comparing classifiers on the same digits.

digit, error rate, reconstruction, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > France (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.69)

Industry:

Education (0.48)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsOct-2-2025, 17:03:44 GMT

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss null. In the parametric setting, depending upon (null, F, P) ERM can have slow (1 / n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n . There exist several results that give sufficient conditions for fast rates in terms of joint properties of null, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss null (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of ( null, F, P), and in so doing provides new insight into the fast-rates phenomenon.

bernstein condition, fast rate, stochastic mixability, (15 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

3a30be93eb45566a90f4e95ee72a089a-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 17:02:06 GMT

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models

Farnam Mansouri, Yuxin Chen, Ara Vartanian, Jerry Zhu, Adish Singla

Neural Information Processing SystemsOct-2-2025, 16:51:27 GMT

Algorithmic machine teaching studies the interaction between a teacher and a learner where the teacher selects labeled examples aiming at teaching a target hypothesis.

artificial intelligence, machine learning, preference function, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Add feedback

Beating SGD Saturation with T ail-A veraging and Minibatching

Neural Information Processing SystemsOct-2-2025, 16:47:30 GMT

Stochastic gradient descent (SGD) provides a simple and yet stunningly efficient way to solve a broad range of machine learning problems.

artificial intelligence, gradient descent, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)

Add feedback

Superposition of many models into one

Brian Cheung, Alexander Terekhov, Yubei Chen, Pulkit Agrawal, Bruno Olshausen

Neural Information Processing SystemsOct-2-2025, 16:42:59 GMT

We present a method for storing multiple models within a single set of parameters . Models can coexist in superposition and still be retrieved individually.

artificial intelligence, machine learning, superposition, (15 more...)

Neural Information Processing Systems

Country: North America (0.14)

Industry: Education > Educational Setting > Online (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

35464c848f410e55a13bb9d78e7fddd0-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 16:03:20 GMT

artificial intelligence, machine learning, neighbor, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

3472ab80b6dff70c54758fd6dfc800c2-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 15:53:43 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Industry:

Education (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.70)

Add feedback

Sum-of-Squares Lower Bounds for Sparse PCA

Tengyu Ma, Avi Wigderson

Neural Information Processing SystemsOct-2-2025, 15:43:47 GMT

This paper establishes a statistical versus computational trade-off for solving a basic high-dimensional machine learning problem via a basic convex relaxation method. Specifically, we consider the Sparse Principal Component Analysis (Sparse PCA) problem, and the family of Sum-of-Squares (SoS, aka Lasserre/Parillo) convex relaxations.

algorithm, constraint, relaxation, (13 more...)

Neural Information Processing Systems

Country: