AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Reviews: Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

Neural Information Processing SystemsJan-21-2025, 06:17:43 GMT

The main contributions are threefold: - An algorithm-independent probabilistic lower bound of the number of non-adaptive tests is given via Fano's inequality (Theorem 1). Under the Bernoulli testing, upper bounds are provided for COMP (Theorem 2) and DD (Theorem 3), and a lower bound is provided for SSS (Theorem 4). These in particular show that DD is asymptotically optimal under the Bernoulli testing when \theta 1/2. Although the three review scores exhibited a relatively large split in the initial round of review, after the authors' rebuttal as well as discussion among the reviewers, all the three reviewers have become positive ratings. I would thus like to recommend acceptance of this paper.

edge detecting query, learning erdo-renyi random graph, theorem, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.45)

Add feedback

Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms

Wang, Xiaoyu, Kasprzak, Mikolaj J., Negrea, Jeffrey, Bourguin, Solesne, Huggins, Jonathan H.

arXiv.org Machine LearningJan-21-2025

Stochastic iterative algorithms, including stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD), are widely utilized for optimization and sampling in large-scale and high-dimensional problems in machine learning, statistics, and engineering. Numerous works have bounded the parameter error in, and characterized the uncertainty of, these approximations. One common approach has been to use scaling limit analyses to relate the distribution of algorithm sample paths to a continuous-time stochastic process approximation, particularly in asymptotic setups. Focusing on the univariate setting, in this paper, we build on previous work to derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation using an infinite-dimensional version of Stein's method of exchangeable pairs. We show that this bound implies weak convergence under modest additional assumptions and leads to a bound on the error of the variance of the iterate averages of the algorithm. Furthermore, we use our main result to construct error bounds in terms of two common metrics: the L\'{e}vy-Prokhorov and bounded Wasserstein distances. Our results provide a foundation for developing similar error bounds for the multivariate setting and for more sophisticated stochastic approximation algorithms.

artificial intelligence, machine learning, quantitative scaling limit, (17 more...)

arXiv.org Machine Learning

2501.12212

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

Reviews: Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs

Neural Information Processing SystemsJan-20-2025, 20:01:10 GMT

The paper provides learning algorithm for predicting the solution of linear program with partial observation and provides some mistake bounds for these algorithms. Two different problems are considered: 1) The objective function of LP is known but the constraint sets are not known. At each time a new constraint is revealed and the goal is to predict the optimal solution of the LP including all known and unknown constraints. In the first problem, the mistake bounds are obtained for two cases: when the new constraints are selected adversarially, in which case the authors show that given a uniform bound on the precision of the constraints and some other assumptions such as non-degeneracy of the constraints, there exists an algorithm with mistake bound and running time linear in terms of the number of edges of the underlying unknown polytope and the number of precision bits. It is shown that uniform upper bound on the precision bits is essential for dimensions higher than 2. To relax this issue, the authors consider the known objective LP problem in stochastic scenario where the revealed constraints at each time are independently chosen from an underlying unknown distribution, in which case a learning algorithm by expanding convex hulls is provided which attains in expectation at most linear mistakes in terms of the number of edges and grows only logarithmic in time.

algorithm, constraint, unknown linear program, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.63)

Add feedback

Reviews: Without-Replacement Sampling for Stochastic Gradient Methods

Neural Information Processing SystemsJan-20-2025, 19:36:11 GMT

The paper studies the problem of minimizing the average of a finite sum of convex functions over a convex domain using stochastic algorithms that, opposed to most popular methods, apply WITHOUT-replacement sampling to the data. More specifically, the paper considers methods that first randomly permute the functions, and then process the functions via incremental updates one at a time, making at most a single pass over the data (hence the data is only shuffled once). There are three main motivations for considering stochastic methods with without-replacement sampling: 1. it is observed many times empirically (and to the best of my knowledge this is what ML practitioners really do) that applying random shuffles to the data and then apply incremental updates works better than with-replacement SGD. 2. After the data is randomly permuted, these algorithms require only sequential access to the memory, which is much more efficient than standard with-replacement SGD methods that require random access to perform the updates. Since the main setting under consideration here is when making at most a single pass, it sounds plausible to assume that the data is already stored in some random permutation, and hence the algorithm is fully sequential, and there is no need to even artificially permute the data. In a distributed setting in which data is partitioned to several machine (not assuming the data is sampled i.i.d from a distribution), it is more natural and efficient to analyze without-replacement algorithms.

algorithm, stochastic gradient method, without-replacement sampling, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Interpretable Nonlinear Dynamic Modeling of Neural Trajectories

Neural Information Processing SystemsJan-20-2025, 18:06:06 GMT

Overall I found the paper to be solid and rather enjoyable, and I would qualify it as a strong candidate for a poster. The authors' method of plotting velocity fields by decomposing the velocity into direction and speed, which they've apparently introduced, is especially effective. It made their arguments and conclusions much easier to follow, and will hopefully be picked up by others. In my opinion stating that this approach leads to "interpretable models" might be somewhat overselling the results – the interpretability of the results is still hampered by the fact that models are composed by 10-100 more or less arbitrary basis functions. That being said, their capacity to reproduce salient features of the phase diagram certainly makes them more interpretable than, say, recurrent neural networks.

expression, interpretable nonlinear dynamic modeling, neural trajectory, (8 more...)

Neural Information Processing Systems

Genre: Personal (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Reviews: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Neural Information Processing SystemsJan-20-2025, 16:46:14 GMT

The paper contains a novel and interesting idea, and is well written. I think it should be accepted. This is one of a very few papers which attempt to combine optimization and statistical considerations. Most papers suggesting algorithms for solving the ERM problem simply focus on solving the deterministic ERM problem, and ignore the fact that ERM objective is an approximation to the true loss which arises as an expectation of the loss over an unknown sample distribution; and hence is subject to an approximation error. This is of course known in the literature, but it is notoriously difficult to address both the optimization aspect and the approximation aspect in a meaningful way in a single work.

empirical risk minimization, international conference, proceedings, (9 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Reviews: Sub-sampled Newton Methods with Non-uniform Sampling

Neural Information Processing SystemsJan-20-2025, 11:27:15 GMT

Pros: This paper is well written and clear. The authors do a good job analyzing their method from a theoretical standpoint. I like that this paper has good theory. I like the kinds of experiments the authors chose, and how they are presented. All in all I think this paper is good, and is a solid contribution to the literature on approximate Newton methods.

iteration, non-uniform sampling, sub-sampled newton method, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)

Add feedback

Reviews: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

Neural Information Processing SystemsJan-20-2025, 10:46:56 GMT

The initial motivation seems to be the work of Hoffman et al on the use of clustering to speedup stochastic methods for ERM. Their method was not proved to converge to the optimal due to the use of biased stochastic gradients. Also, that work seemed to work only for small clusters due to the approach chosen. This papers goes a long way to develop the basic idea into a satisfying theoretical framework which also gives rise to efficient implementations. This paper is truly a pleasure to read – a very fine example of academic exposition.

exploiting, raw cluster, stochastic gradient method, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Reviews: Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Neural Information Processing SystemsJan-20-2025, 10:27:20 GMT

However, this paper is not carefully written. For example, the references are missing on page 6, line 192 and page 7, line 206. The legend of red lines are missing for Figure 2c,d. The paper states only the necessary information but not sufficient for the readers to follow easily. I think the clarity of this paper could be greatly improved especially the authors did not use the full 8 pages.

distributionally robust optimization, sao nasa astrophysic data system, stochastic gradient method, (10 more...)

Neural Information Processing Systems

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Byzantine Stochastic Gradient Descent

Neural Information Processing SystemsJan-20-2025, 04:25:27 GMT

The paper studies stochastic convex optimization in a distributed master/workers framework, where on each round each machine out of m produces a stochastic gradient and sends it to the master, which aggregates these into a mini-batch. In this paper the authors allow a fraction of alpha of the machines to be Byzantine, i.e., they do not need to report valid stochastic gradients but may produce arbitrary vectors, even in an adversarial manner. The goal is to aggregate the reports of the machines and to converge to an optimal solution of the convex objective despite the malicious Byzantine machines. The authors present a novel variant of minibatch-SGD which tackles the difficulty the dealing with Byzantine machines. They prove upper-bounds on the convergence and nearly optimal matching lower-bounds on any algorithm working in such framework, and in this sense the results are quite satisfactory.

byzantine machine, byzantine stochastic gradient descent, current paper, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)

Add feedback