Mathematical & Statistical Methods
Reviews: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond
After Rebuttal: Thank you for the responses. I that believe the paper will be even stronger with the inclusion of the stochastic gradient-variant. This is a very valuable theorem, which will be useful for other theoreticians working in this field. On the other hand, to the best of my knowledge, this is the first paper that uses a stochastic Runge-Kutta integrator for sampling from strongly log-concave densities with explicit guarantees. The authors further show that their proposed numerical scheme improves upon the existing guarantees when applied to the overdamped Langevin dynamics.
Review for NeurIPS paper: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
The authors say that they use as an encoder a convCNP. Looking at the psudo-code in algorithm 1 in the appendix, it is unclear to me if the convCNP is actually run all the way and given some discretize grid as targets, or are the discretization at the level of t_i used? I would assume the latter but this is not stated in the text. If it's the former I don't understand why line 6 and 7 (in algorithm 1) are needed in the encoder. Same goes for the pseudo-code in the appendix.
Review for NeurIPS paper: Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
Weaknesses: The specific empirical evaluation chosen is the primary weakness of the paper. From a neuroscience perspective, the validation of parameter recovery on synthetic data is a necessary first step, but not a sufficient one. Given that [a] the task is primarily of neuroscientific interest and [b] a simpler (though also bayesian belief-updating) fit model is given in the cited prior work, the lack of comparison of cross-validated performance against that prior model is surprising. We should either see better cross-validation performance to the models in prior work, or similar performance but more insight / explanation of the underlying mental computation. This would show us a real payoff of the new insights here.
Review for NeurIPS paper: Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
The paper describes a novel technique for inverse rational control. The reviewers all agree that this is great work that makes an important contribution. There is one important weakness though: the experiments. More comprehensive experiments would be desirable to increase the impact of the work. Nevertheless, this is still good work.
Review for NeurIPS paper: Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
Weaknesses: No Explanation of Transformations of Stochastic Processes: I was under the impression that transforming / reparameterizing a stochasic process is non-trivial. Thus, I was expecting Equation 7 to include a second derivative term. I'm not saying that Equation 7 is wrong, per se---transforming just the increments agrees with intuition. However, the problem is that the paper provides no explanation or mathematical references for stochastic processes and their transformations. There are *zero* citations in both Section 2.2 and Section 3.1.
Review for NeurIPS paper: Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
One reviewer recommend borderline rejection, but in my opinion the authors successfully addressed his concerns in the rebuttal. Recommendations: The authors are encouraged to clearly explain the reviewers' concern on potential similarities of the approach with the Kalman filter with nonlinear outputs. Also the issues related to background and related work and motivation for continuity.
Reviews: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses
The paper studies large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, in particular in ill-conditioned settings, providing new optimal generalization bounds and proofs of convergence. The reviewers found the contributions of high quality and were satisfied with the clarifications provided by the author response.
Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting
Update: I read authors' responce RE:sampling rate does not tell the whole story - i was suggesting to add information about on average how many instances were used for each of the splits (because it is not equal to sampling rate * total dataset size). I am keeping my accept rating, hoping that authors do make the changes to improve the derivations/clarity in the final submission Summary: this paper is concerned with a common trick that a lot of GBDT implementation apply - subsampling instances in order to speed up calculations for finding the best split. The authors formulate the problem of choosing the instances to sample as an optimization problem and derive a modified sampling scheme that is aimed at mimicking the gain that would be assigned to a split on all the of the data by using a gain calculated only on a subsampled instances. The experiments demonstrate good results. The paper is well written and easy to follow, apart from a couple of places in derivations(see my questions).
Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting
The authors propose a non-uniform sampling strategy for stochastic gradient boosted decision trees. In particular, sampling probability of the training data is optimized towards maximizing the estimation accuracy of the splitting score of decision trees. The optimization problem allows an approximate closed-form solution. Experiment results demonstrate superior performance of the proposed strategy. The reviewers agree that the paper can not only help understand sampling within GBDT from a more rigorous perspective but also improve GBDT implementations in practice.