Mathematical & Statistical Methods
Review for NeurIPS paper: SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence
The reviewers had some concerns about the empirical evaluation and the lack of discrete-time results, but agree that this paper would be a useful addition to NeurIPS. Please see the reviews (and your response) for ways to improve the final manuscript (especially in terms of clarifying the context and scope of this work).
Sequential Neural Processes
Neural Processes combine the strengths of neural networks and Gaussian processes to achieve both flexible learning and fast prediction in stochastic processes. However, a large class of problems comprises underlying temporal dependency structures in a sequence of stochastic processes that Neural Processes (NP) do not explicitly consider. In this paper, we propose Sequential Neural Processes (SNP) which incorporates a temporal state-transition model of stochastic processes and thus extends its modeling capabilities to dynamic stochastic processes. In applying SNP to dynamic 3D scene modeling, we introduce the Temporal Generative Query Networks. To our knowledge, this is the first 4D model that can deal with the temporal dynamics of 3D scenes. In experiments, we evaluate the proposed methods in dynamic (non-stationary) regression and 4D scene inference and rendering.
Neural Lyapunov Control for Discrete-Time Systems
While ensuring stability for linear systems is well understood, it remains a major challenge for nonlinear systems. A general approach in such cases is to compute a combination of a Lyapunov function and an associated control policy. However, finding Lyapunov functions for general nonlinear systems is a challenging task. To address this challenge, several methods have been proposed that represent Lyapunov functions using neural networks. However, such approaches either focus on continuous-time systems, or highly restricted classes of nonlinear dynamics.
Reviews: Learning Erdos-Renyi Random Graphs via Edge Detecting Queries
This paper studies a learning task of Erdos-Renyi graph with edge detecting queries. Each query or test is given as a set of input (a set of nodes) X representing a set of nodes and output Y(X) 0 or 1 representing the absence or presence of at lease one edge in the node set X. The authors derive the theoretical and algorithmic bounds of the number of tests required for achieving the perfect reconstruction of the graph in the asymptotic limit where the number of nodes n tends to be infinity, using three known algorithms: COMP, DD, and SSS. The derived analytic form is tight up to the asymptotic constant. A new algorithm is also invented based on GROTESQUE algorithm, achieving a sublinear computational time in the decoding step which is near optimal compared to the derived bounds.
Reviews: Learning Erdos-Renyi Random Graphs via Edge Detecting Queries
The main contributions are threefold: - An algorithm-independent probabilistic lower bound of the number of non-adaptive tests is given via Fano's inequality (Theorem 1). Under the Bernoulli testing, upper bounds are provided for COMP (Theorem 2) and DD (Theorem 3), and a lower bound is provided for SSS (Theorem 4). These in particular show that DD is asymptotically optimal under the Bernoulli testing when \theta 1/2. Although the three review scores exhibited a relatively large split in the initial round of review, after the authors' rebuttal as well as discussion among the reviewers, all the three reviewers have become positive ratings. I would thus like to recommend acceptance of this paper.
Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms
Wang, Xiaoyu, Kasprzak, Mikolaj J., Negrea, Jeffrey, Bourguin, Solesne, Huggins, Jonathan H.
Stochastic iterative algorithms, including stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD), are widely utilized for optimization and sampling in large-scale and high-dimensional problems in machine learning, statistics, and engineering. Numerous works have bounded the parameter error in, and characterized the uncertainty of, these approximations. One common approach has been to use scaling limit analyses to relate the distribution of algorithm sample paths to a continuous-time stochastic process approximation, particularly in asymptotic setups. Focusing on the univariate setting, in this paper, we build on previous work to derive non-asymptotic functional approximation error bounds between the algorithm sample paths and the Ornstein-Uhlenbeck approximation using an infinite-dimensional version of Stein's method of exchangeable pairs. We show that this bound implies weak convergence under modest additional assumptions and leads to a bound on the error of the variance of the iterate averages of the algorithm. Furthermore, we use our main result to construct error bounds in terms of two common metrics: the L\'{e}vy-Prokhorov and bounded Wasserstein distances. Our results provide a foundation for developing similar error bounds for the multivariate setting and for more sophisticated stochastic approximation algorithms.
Reviews: Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs
The paper provides learning algorithm for predicting the solution of linear program with partial observation and provides some mistake bounds for these algorithms. Two different problems are considered: 1) The objective function of LP is known but the constraint sets are not known. At each time a new constraint is revealed and the goal is to predict the optimal solution of the LP including all known and unknown constraints. In the first problem, the mistake bounds are obtained for two cases: when the new constraints are selected adversarially, in which case the authors show that given a uniform bound on the precision of the constraints and some other assumptions such as non-degeneracy of the constraints, there exists an algorithm with mistake bound and running time linear in terms of the number of edges of the underlying unknown polytope and the number of precision bits. It is shown that uniform upper bound on the precision bits is essential for dimensions higher than 2. To relax this issue, the authors consider the known objective LP problem in stochastic scenario where the revealed constraints at each time are independently chosen from an underlying unknown distribution, in which case a learning algorithm by expanding convex hulls is provided which attains in expectation at most linear mistakes in terms of the number of edges and grows only logarithmic in time.
Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs
Shahin Jabbari, Ryan M. Rogers, Aaron Roth, Steven Z. Wu
We define and study the problem of predicting the solution to a linear program (LP) given only partial information about its objective and constraints. This generalizes the problem of learning to predict the purchasing behavior of a rational agent who has an unknown objective function, that has been studied under the name "Learning from Revealed Preferences". We give mistake bound learning algorithms in two settings: in the first, the objective of the LP is known to the learner but there is an arbitrary, fixed set of constraints which are unknown. Each example is defined by an additional known constraint and the goal of the learner is to predict the optimal solution of the LP given the union of the known and unknown constraints. This models the problem of predicting the behavior of a rational agent whose goals are known, but whose resources are unknown. In the second setting, the objective of the LP is unknown, and changing in a controlled way. The constraints of the LP may also change every day, but are known. An example is given by a set of constraints and partial information about the objective, and the task of the learner is again to predict the optimal solution of the partially known LP.
Reviews: Without-Replacement Sampling for Stochastic Gradient Methods
The paper studies the problem of minimizing the average of a finite sum of convex functions over a convex domain using stochastic algorithms that, opposed to most popular methods, apply WITHOUT-replacement sampling to the data. More specifically, the paper considers methods that first randomly permute the functions, and then process the functions via incremental updates one at a time, making at most a single pass over the data (hence the data is only shuffled once). There are three main motivations for considering stochastic methods with without-replacement sampling: 1. it is observed many times empirically (and to the best of my knowledge this is what ML practitioners really do) that applying random shuffles to the data and then apply incremental updates works better than with-replacement SGD. 2. After the data is randomly permuted, these algorithms require only sequential access to the memory, which is much more efficient than standard with-replacement SGD methods that require random access to perform the updates. Since the main setting under consideration here is when making at most a single pass, it sounds plausible to assume that the data is already stored in some random permutation, and hence the algorithm is fully sequential, and there is no need to even artificially permute the data. In a distributed setting in which data is partitioned to several machine (not assuming the data is sampled i.i.d from a distribution), it is more natural and efficient to analyze without-replacement algorithms.
Reviews: Interpretable Nonlinear Dynamic Modeling of Neural Trajectories
Overall I found the paper to be solid and rather enjoyable, and I would qualify it as a strong candidate for a poster. The authors' method of plotting velocity fields by decomposing the velocity into direction and speed, which they've apparently introduced, is especially effective. It made their arguments and conclusions much easier to follow, and will hopefully be picked up by others. In my opinion stating that this approach leads to "interpretable models" might be somewhat overselling the results – the interpretability of the results is still hampered by the fact that models are composed by 10-100 more or less arbitrary basis functions. That being said, their capacity to reproduce salient features of the phase diagram certainly makes them more interpretable than, say, recurrent neural networks.