Goto

Collaborating Authors

 Mathematical & Statistical Methods


Review for NeurIPS paper: Impossibility Results for Grammar-Compressed Linear Algebra

Neural Information Processing Systems

Summary and Contributions: The paper considers the possibility of running algorithms directly on the compressed data to obtain significant time savings. In particular, the paper considers the compression with restricted form of grammar compressed strings that capture modern compression tools like Lempel-Ziv. Let N be the input size and T(N) n be the compressed size. The goal would be to create algorithms with running time that depend on n in the same way standard algorithms depend on N. In this paper the authors consider dot product, matrix vector product and matrix matrix product and show conditional lower bounds by reduction from problems assumed to be hard (3SUM, K-SUM) For matrix-matrix product, the authors show that even when the input matrices can be greatly compressed the output (in compresses form) still requires essentially N 2 bits, which means that any algorithm working on compressed data would need at least this time. For dot product of two vectors, the authors show several results for different assumptions.


Impossibility Results for Grammar-Compressed Linear Algebra

Neural Information Processing Systems

To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size N down to size n! N, it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently as if the original data was that small. That is, if an operation has time complexity T pinput-sizeq, can we perform it on the compressed representation in time T pnq rather than T pNq? We consider the most basic linear algebra operations: inner product, matrix-vector multiplication, and matrix multiplication. In particular, given two compressed vectors, can we compute their inner product in time Opnq? Or perhaps we must decompress first and then multiply, spending ΩpNq time? The answer depends on the compression scheme.


Review for NeurIPS paper: Impossibility Results for Grammar-Compressed Linear Algebra

Neural Information Processing Systems

This paper received overall good reviews and is considered novel and of interest. In terms of technical contribution it seems the improvement over previous work ([2]) is somewhat incremental. Another issue that was raised is relevance to the audience. The authors should better explain and justify the connection between their work and the current research performed in ML. Also, perhaps discussing relevant literature in ML on learning algorithms that work over lossless compressed data and how the aforementioned lower bound relates to existing techniques.


Review for NeurIPS paper: Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs

Neural Information Processing Systems

Weaknesses: This paper provides a nice advance in the theory of infeasible-start long-step IPMs, however the novelty of the approach taken and the relation of the work in the paper to prior work could use further clarity. First, solving regression problems in an A in nearly linear time, when A has many more rows than columns has been the subject of a line of research, e.g. These results, including ones based on the subspace embedding result used in this paper, readily extend to solving linear systems in A T A and this has been used by the Theoretical Computer Science papers mentioned for implementing short step IPMs. Consequently, I think it would have been beneficial to state earlier that the paper is using the known linear system solving machinery of subspace embeddings to build preconditioners (rather than just saying that "Randomized Linear Algebra" is used) and put this in the context of prior work. There may be novelty in the particular way in which the paper is using conjugate gradient and subspace embeddings, however the paper would be strengthened if it articulated how this is different than this previous literature; as the appendix points out, conjugate gradient can be replaced with other iterative methods which possibly puts the approach considered closer to the ones from the literature. In light of the previous paragraph, I think more of the novelty in the paper may lie in exactly how they handle the error from approximate linear system solves in a way sensitive to the design of the preconditioner.


Faster Randomized Infeasible Interior Point Methods for Tall Wide Linear Programs

Neural Information Processing Systems

Their underlying complexity is dominated by the cost of solving a system of linear equations at each iteration. In this paper, we consider infeasible IPMs for the special case where the number of variables is much larger than the number of constraints (i.e., wide), or vice-versa (i.e., tall) by taking the dual. Using tools from Randomized Linear Algebra, we present a preconditioning technique that, when combined with the Conjugate Gradient iterative solver, provably guarantees that infeasible IPM algorithms (suitably modified to account for the error incurred by the approximate solver), converge to a feasible, approximately optimal solution, without increasing their iteration complexity. Our empirical evaluations verify our theoretical results on both real and synthetic data.


Review for NeurIPS paper: Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs

Neural Information Processing Systems

The paper was overall well-received, and R4 in particular liked the combination of randomized lin algebra with IPM and the solid technical analysis. R3 brought up some major points and thought of this as a borderline paper, in part because of a narrow scope of applicability. However, overall, the AC and SAC agree this is an interesting paper (as well as well-written and technically solid), and is enough to be over the bar for NeurIPS. R3 presents a concern that some of the presentation relative to past methods is a bit misleading, and this should be addressed in the minor revisions. Please see R3s review for full details.


Reviews: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

Neural Information Processing Systems

After Rebuttal: Thank you for the responses. I that believe the paper will be even stronger with the inclusion of the stochastic gradient-variant. This is a very valuable theorem, which will be useful for other theoreticians working in this field. On the other hand, to the best of my knowledge, this is the first paper that uses a stochastic Runge-Kutta integrator for sampling from strongly log-concave densities with explicit guarantees. The authors further show that their proposed numerical scheme improves upon the existing guarantees when applied to the overdamped Langevin dynamics.


Estimation of Skill Distribution from a Tournament Anuran Makur Department of CEE

Neural Information Processing Systems

In this paper, we study the problem of learning the skill distribution of a population of agents from observations of pairwise games in a tournament. These games are played among randomly drawn agents from the population. The agents in our model can be individuals, sports teams, or Wall Street fund managers. Formally, we postulate that the likelihoods of outcomes of games are governed by the parametric Bradley-Terry-Luce (or multinomial logit) model, where the probability of an agent beating another is the ratio between its skill level and the pairwise sum of skill levels, and the skill parameters are drawn from an unknown, non-parametric skill density of interest. The problem is, in essence, to learn a distribution from noisy, quantized observations.


Supplementary Material: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Neural Information Processing Systems

We first review the notation introduced in the main body for convenience. S denote a context and target set respectively. Later, as is common in recent meta-learning approaches, we will consider predicting the target set from the context set Garnelo et al. [3, 4]. The measurable sets of Σ are those which can be specified by the values of the function at a countable subset I X of its input locations. Since in practice we only ever observe data at a finite number of points, this is sufficient for our purposes. Hence we may think of these stochastic processes as defined by their finite-dimensional marginals. We now define what it means to condition on observations of the stochastic process P. Let p(y|X) denote the density with respect to Lebesgue measure of the finite marginal of P with index set X (we assume these densities always exist). Strictly speaking, this is non-standard terminology, since P is the law of a stochastic process.


Review for NeurIPS paper: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Neural Information Processing Systems

The authors say that they use as an encoder a convCNP. Looking at the psudo-code in algorithm 1 in the appendix, it is unclear to me if the convCNP is actually run all the way and given some discretize grid as targets, or are the discretization at the level of t_i used? I would assume the latter but this is not stated in the text. If it's the former I don't understand why line 6 and 7 (in algorithm 1) are needed in the encoder. Same goes for the pseudo-code in the appendix.