Optimization
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary The paper introduces a simple strategy to reduce the variance of gradients in stochastic variational inference methods. Variance reduction is achieved by storing the last L data-point's contribution to the approximated/stochastic gradient and averaging these values. There exists a bias variance trade off: variance reduction comes at the cost of increased bias in the gradient estimates. The bias-variance tradeoff can be controlled by varying the sliding window size L. Also this strategy requires storing the last L data-point gradient contributions which can be significant.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The proposed approach, while straightforward, quite elegantly handles the problem at hand. What prevents this paper from being a clear cut acceptance is the lack of adequate experimental validation. Typos line 47: draw -> drawn A more thorough discussion of noise in the exploration step of Algorithm 1 (step 8) would be appreciated. This issue is also not discussed in the experiments section (how much noise was used?). I also had a few issues with some of the claimed advantages in the paper. Specifically: (1) The claim that PDDP has an advantage over PILCO since it does not have to solve non-convex optimization problems seems suspect given the non-convexity of the optimization problem solved in the hyper-parameter tuning step.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This is a very interesting and substantially novel paper that introduces an approach to solving continuous Markov random field energies with polynomial potentials. An insightful and well-motivated approach towards this end (ADMM-Poly) was published at CVPR 2013 [20] and is the obvious baseline to compare against. The present approach is convincingly shown to be preferable, as it is both elegant and computationally efficient. The main idea underlying the approach is to decompose the polynomials into a difference of convex functions.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","157" "Title:","Object Localization based on Structural SVM using Privileged Information" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The method is effective for the object localization task and results in good improvements in localization accuracy. It looks like the authors' formulation of SSVM+ contains separate slack variables \xi_i for each example x_i and there are extra degrees of freedom. How many alternating iterations are required? When the parameter vectors w and w^* are far from the optimal solution, could this alternating inference procedure get stuck in bad local minima?