AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Learning values across many orders of magnitude

Hado P. van Hasselt, Arthur Guez, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver

Neural Information Processing SystemsMar-23-2026, 09:19:16 GMT

Most learning algorithms are not invariant to the scale of the signal that is being approximated. We propose to adaptively normalize the targets used in the learning updates. This is important in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Simple Practical Accelerated Method for Finite Sums

Aaron Defazio

Neural Information Processing SystemsMar-23-2026, 08:46:18 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, proximal operator, (13 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

SDP Relaxation with Randomized Rounding for Energy Disaggregation

Kiarash Shaloudegi, András György, Csaba Szepesvari, Wilsun Xu

Neural Information Processing SystemsMar-23-2026, 08:45:23 GMT

We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method.

appliance, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.14)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Hierarchical Clustering via Spreading Metrics

Aurko Roy, Sebastian Pokutta

Neural Information Processing SystemsMar-23-2026, 08:44:45 GMT

We study the cost function for hierarchical clusterings introduced by [16] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [16] that a top-down algorithm returns a hierarchical clustering of cost at most O(αnlog n) times the cost of the optimal hierarchical clustering, where αn is the approximation ratio of the Sparsest Cut subroutine used. Thus using the best known approximation algorithm for Sparsest Cut due to Arora-Rao-Vazirani, the top-down algorithm returns a hierarchical clustering of cost at most O log3/2 ntimes the cost of the optimal solution. We improve this by giving an O(log n)-approximation algorithm for this problem. Our main technical ingredients are a combinatorial characterization of ultrametrics induced by this cost function, deriving an Integer Linear Programming (ILP) formulation for this family of ultrametrics, and showing how to iteratively round an LP relaxation of this formulation by using the idea of sphere growing which has been extensively used in the context of graph partitioning. We also prove that our algorithm returns an O(log n)-approximate hierarchical clustering for a generalization of this cost function also studied in [16]. We also give constant factor inapproximability results for this problem.

artificial intelligence, cost function, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan

Neural Information Processing SystemsMar-23-2026, 08:18:05 GMT

The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently computed from the data, and propose two algorithms based on clustering information. Our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of the ERM problem, and our variance-reduction based algorithm ClusterSVRG introduces a new gradient estimator using clustering. Our algorithms outperform their classical counterparts ACDM and SVRG respectively.

artificial intelligence, machine learning, vector, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Identification and Overidentification of Linear Structural Equation Models

Bryant Chen

Neural Information Processing SystemsMar-23-2026, 08:17:28 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, coefficient, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

Probing the Compositionality of Intuitive Functions

Eric Schulz, Josh Tenenbaum, David K. Duvenaud, Maarten Speekenbrink, Samuel J. Gershman

Neural Information Processing SystemsMar-23-2026, 08:17:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, kernel, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization

Davood Hajinezhad, Mingyi Hong, Tuo Zhao, Zhaoran Wang

Neural Information Processing SystemsMar-23-2026, 08:16:50 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

An algorithm for L1 nearest neighbor search via monotonic embedding

Xinan Wang, Sanjoy Dasgupta

Neural Information Processing SystemsMar-23-2026, 08:16:12 GMT

Fast algorithms for nearest neighbor (NN) search have in large part focused on 2 distance. Here we develop an approach for 1 distance that begins with an explicit and exactly distance-preserving embedding of the points into 22. We show how this can efficiently be combined with random-projection based methods for 2 NN search, such as locality-sensitive hashing (LSH) or random projection trees. We rigorously establish the correctness of the methodology and show by experimentation using LSH that it is competitive in practice with available alternatives.

information retrieval, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.42)

Add feedback

A Probabilistic Programming Approach To Probabilistic Data Analysis

Feras Saad, Vikash K. Mansinghka

Neural Information Processing SystemsMar-23-2026, 08:15:33 GMT

Probabilistic techniques are central to data analysis, but different approaches can be challenging to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include discriminative machine learning, hierarchical Bayesian models, multivariate kernel methods, clustering algorithms, and arbitrary probabilistic programs. We demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling definition language and structured query language. The practical value is illustrated in two ways. First, the paper describes an analysis on a database of Earth satellites, which identifies records that probably violate Kepler's Third Law by composing causal probabilistic programs with nonparametric Bayes in 50 lines of probabilistic code. Second, it reports the lines of code and accuracy of CGPMs compared with baseline solutions from standard machine learning libraries.

artificial intelligence, cgpm, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.94)
North America > United States (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback