AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Density Estimation via Discrepancy Based Adaptive Sequential Partition

Neural Information Processing SystemsMar-23-2026, 03:02:34 GMT

Given iidobservations from an unknown absolute continuous distribution defined on some domain Ω, we propose a nonparametric method to learn a piecewise constant function to approximate the underlying probability density function. Our density estimate is a piecewise constant function defined on a binary partition of Ω. The key ingredient of the algorithm is to use discrepancy, a concept originates from Quasi Monte Carlo analysis, to control the partition process. The resulting algorithm is simple, efficient, and has a provable convergence rate. We empirically demonstrate its efficiency as a density estimation method. We also show how it can be utilized to find good initializations for k-means.

artificial intelligence, machine learning, partition, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County (0.28)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Higher-Order Factorization Machines

Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata

Neural Information Processing SystemsMar-23-2026, 02:32:40 GMT

Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We also present new variants of HOFMs with shared parameters, which greatly reduce model size and prediction times while maintaining similar accuracy. We demonstrate the proposed approaches on four different link prediction tasks.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

MetaGrad: Multiple Learning Rates in Online Learning

Tim van Erven, Wouter M. Koolen

Neural Information Processing SystemsMar-23-2026, 02:31:49 GMT

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad's main feature is that it simultaneously considers multiple learning rates. Unlike previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Education > Educational Setting > Online (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Combining Low-Density Separators with CNNs

Yu-Xiong Wang, Martial Hebert

Neural Information Processing SystemsMar-23-2026, 01:59:38 GMT

This work explores CNNs for the recognition of novel categories from few examples. Inspired by the transferability properties of CNNs, we introduce an additional unsupervised meta-training stage that exposes multiple top layer units to a large amount of unlabeled real-world images. By encouraging these units to learn diverse sets of low-density separators across the unlabeled data, we capture a more generic, richer description of the visual world, which decouples these units from ties to a specific set of categories. We propose an unsupervised margin maximization that jointly estimates compact high-density regions and infers low-density separators. The low-density separator (LDS) modules can be plugged into any or all of the top layers of a standard CNN architecture. The resulting CNNs significantly improve the performance in scene classification, fine-grained recognition, and action recognition with small training samples.

artificial intelligence, machine learning, recognition, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Active Nearest-Neighbor Learning in Metric Spaces

Aryeh Kontorovich, Sivan Sabato, Ruth Urner

Neural Information Processing SystemsMar-23-2026, 01:59:22 GMT

We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.

artificial intelligence, compression, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.62)

Add feedback

Learning brain regions via large-scale online structured sparse dictionary learning

Elvis DOHMATOB, Arthur Mensch, Gael Varoquaux, Bertrand Thirion

Neural Information Processing SystemsMar-23-2026, 01:58:45 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, dictionary atom, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Supervised Word Mover's Distance

Gao Huang, Chuan Guo, Matt J. Kusner, Yu Sun, Fei Sha, Kilian Q. Weinberger

Neural Information Processing SystemsMar-23-2026, 01:58:09 GMT

Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric.

machine learning, natural language, text classification, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.66)

Add feedback

Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models

Tomoharu Iwata, Makoto Yamada

Neural Information Processing SystemsMar-23-2026, 01:25:53 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

On Valid Optimal Assignment Kernels and Applications to Graph Classification

Nils M. Kriege, Pierre-Louis Giscard, Richard Wilson

Neural Information Processing SystemsMar-23-2026, 01:25:38 GMT

The success of kernel methods has initiated the design of novel positive semidefinite functions, in particular for structured data. A leading design paradigm for this is the convolution kernel, which decomposes structured objects into their parts and sums over all pairs of parts. Assignment kernels, in contrast, are obtained from an optimal bijection between parts, which can provide a more valid notion of similarity. In general however, optimal assignments yield indefinite functions, which complicates their use in kernel methods. We characterize a class of base kernels used to compare parts that guarantees positive semidefinite optimal assignment kernels. These base kernels give rise to hierarchies from which the optimal assignment kernels are computed in linear time by histogram intersection. We apply these results by developing the Weisfeiler-Lehman optimal assignment kernel for graphs. It provides high classification accuracy on widely-used benchmark data sets improving over the original Weisfeiler-Lehman kernel.

artificial intelligence, kernel, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)

Add feedback

Measuring the reliability of MCMC inference with bidirectional Monte Carlo

Roger B. Grosse, Siddharth Ancha, Daniel M. Roy

Neural Information Processing SystemsMar-23-2026, 01:25:06 GMT

Markov chain Monte Carlo (MCMC) is one of the main workhorses of probabilistic inference, but it is notoriously hard to measure the quality of approximate posterior samples. This challenge is particularly salient in black box inference methods, which can hide details and obscure inference failures. In this work, we extend the recently introduced bidirectional Monte Carlo [GGA15] technique to evaluate MCMC-based posterior inference algorithms. By running annealed importance sampling (AIS) chains both from prior to posterior and vice versa on simulated data, we upper bound in expectation the symmetrized KL divergence between the true posterior distribution and the distribution of approximate samples. We integrate our method into two probabilistic programming languages, WebPPL [GS] and Stan [CGHL+ p], and validate it on several models and datasets. As an example of how our method be used to guide the design of inference algorithms, we apply it to study the effectiveness of different model representations in WebPPL and Stan.

artificial intelligence, machine learning, simulated data, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback