AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Reviews: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Neural Information Processing SystemsJan-20-2025, 16:46:14 GMT

The paper contains a novel and interesting idea, and is well written. I think it should be accepted. This is one of a very few papers which attempt to combine optimization and statistical considerations. Most papers suggesting algorithms for solving the ERM problem simply focus on solving the deterministic ERM problem, and ignore the fact that ERM objective is an approximation to the true loss which arises as an expectation of the loss over an unknown sample distribution; and hence is subject to an approximation error. This is of course known in the literature, but it is notoriously difficult to address both the optimization aspect and the approximation aspect in a meaningful way in a single work.

book review, empirical risk minimization, international conference, (12 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Aryan Mokhtari, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann, Alejandro Ribeiro

Neural Information Processing SystemsJan-20-2025, 16:46:13 GMT

We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton's method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the next training set. This allows to exploit the quadratic convergence property of Newton's method and reach the statistical accuracy of each training set with only one iteration of Newton's method. We show theoretically that we can iteratively increase the sample size while applying single Newton iterations without line search and staying within the statistical accuracy of the regularized empirical risk. In particular, we can double the size of the training set in each iteration when the number of samples is sufficiently large. Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset.

artificial intelligence, machine learning, statistical accuracy, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > New York (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Estimating the Size of a Large Network and its Communities from a Random Sample

Lin Chen, Amin Karbasi, Forrest W. Crawford

Neural Information Processing SystemsJan-20-2025, 15:25:16 GMT

Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V, E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W V and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership.

artificial intelligence, social media, vertex, (17 more...)

Neural Information Processing Systems

Country:

North America (0.68)
Asia (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.67)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Communications > Networks (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)

Add feedback

Reviews: Sub-sampled Newton Methods with Non-uniform Sampling

Neural Information Processing SystemsJan-20-2025, 11:27:15 GMT

Pros: This paper is well written and clear. The authors do a good job analyzing their method from a theoretical standpoint. I like that this paper has good theory. I like the kinds of experiments the authors chose, and how they are presented. All in all I think this paper is good, and is a solid contribution to the literature on approximate Newton methods.

artificial intelligence, non-uniform sampling, sub-sampled newton method, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)

Add feedback

Sub-sampled Newton Methods with Non-uniform Sampling

Peng Xu, Jiyan Yang, Fred Roosta, Christopher Ré, Michael W. Mahoney

Neural Information Processing SystemsJan-20-2025, 11:27:14 GMT

We consider the regime where n d.

artificial intelligence, leverage score, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Reviews: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

Neural Information Processing SystemsJan-20-2025, 10:46:56 GMT

The initial motivation seems to be the work of Hoffman et al on the use of clustering to speedup stochastic methods for ERM. Their method was not proved to converge to the optimal due to the use of biased stochastic gradients. Also, that work seemed to work only for small clusters due to the approach chosen. This papers goes a long way to develop the basic idea into a satisfying theoretical framework which also gives rise to efficient implementations. This paper is truly a pleasure to read – a very fine example of academic exposition.

exploiting, raw cluster, stochastic gradient method

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Reviews: Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Neural Information Processing SystemsJan-20-2025, 10:27:20 GMT

However, this paper is not carefully written. For example, the references are missing on page 6, line 192 and page 7, line 206. The legend of red lines are missing for Figure 2c,d. The paper states only the necessary information but not sufficient for the readers to follow easily. I think the clarity of this paper could be greatly improved especially the authors did not use the full 8 pages.

distributionally robust optimization, sao nasa astrophysic data system, stochastic gradient method, (11 more...)

Neural Information Processing Systems

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Incremental Variational Sparse Gaussian Process Regression

Ching-An Cheng, Byron Boots

Neural Information Processing SystemsJan-20-2025, 07:53:57 GMT

Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent state-of-the-art incremental solutions to variational sparse GPR.

artificial intelligence, machine learning, modeling & simulation, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Reviews: Byzantine Stochastic Gradient Descent

Neural Information Processing SystemsJan-20-2025, 04:25:27 GMT

The paper studies stochastic convex optimization in a distributed master/workers framework, where on each round each machine out of m produces a stochastic gradient and sends it to the master, which aggregates these into a mini-batch. In this paper the authors allow a fraction of alpha of the machines to be Byzantine, i.e., they do not need to report valid stochastic gradients but may produce arbitrary vectors, even in an adversarial manner. The goal is to aggregate the reports of the machines and to converge to an optimal solution of the convex objective despite the malicious Byzantine machines. The authors present a novel variant of minibatch-SGD which tackles the difficulty the dealing with Byzantine machines. They prove upper-bounds on the convergence and nearly optimal matching lower-bounds on any algorithm working in such framework, and in this sense the results are quite satisfactory.

artificial intelligence, byzantine machine, byzantine stochastic gradient descent, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)

Add feedback

Factoring nonnegative matrices with linear programs

Neural Information Processing SystemsJan-20-2025, 04:00:25 GMT

This paper describes a new approach for computing nonnegative matrix factorizations (NMFs) with linear programming. The key idea is a data-driven model for the factorization, in which the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X CX and some linear constraints. The matrix C selects features, which are then used to compute a low-rank NMF of X. A theoretical analysis demonstrates that this approach has the same type of guarantees as the recent NMF algorithm of Arora et al. (2012).

artificial intelligence, factoring nonnegative matrix, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback