Supervised Learning
Review for NeurIPS paper: Structured Prediction for Conditional Meta-Learning
The reviewers agreed that this paper brings an important and relevant contribution to the NeurIPS community, and presents comprehensive experiments to validate the proposed approach. The authors are strongly encouraged to revise the submitted paper according to the feedback in the reviews, including a discussion of multi-task learning, adding the requested clarifications, and fixing typos.
Reviews: Localized Structured Prediction
The model is learned by breaking the structure into parts and performing kernel ridge regression on the parts. They show elaborate convergence rate analysis in the estimation. The theoretical analysis is the strong part of this paper. In a lot of computer vision and NLP applications the latest research is about capturing long range dependencies. The correlation in Figure 1 is highly concentrated at the central patch because it's the average of many different images, but on individual images the correlation patten can be very different.
Reviews: Localized Structured Prediction
The authors propose a general theoretical framework for structured prediction that deals with cases where the data exhibits a local structure, so that the inputs and outputs can be decomposed into parts. The reviewers deemed the theoretical contributions to be of original and of a high quality. The author response addressed the perceived weaknesses, in particular in the empirical evaluation, in a satisfcatory way.
Reviews: Linear Relaxations for Finding Diverse Elements in Metric Spaces
Although the provided novel algorithm looks impressive both from the theoretical prospective and in the experimental comparison, its substantiation has quite some room for improvement. The major point is the proof of Theorem 1: - it is unclear how the proof of the theorem follows from Lemmas 3 and 4, since none of these lemmas is related to the optimal solution of the considered diversity problem. I assume that the missing proposition is the one, which would establish connection between the considered linear program in lines 153-154 (by the way, it is very uncomfortable that the main formulation is not numbered and therefore can not be easily referenced) and the diversity problem. I believe that this connection may have the following format: if the linear program is equipped with integrality constraints (which is, all variables x_{ir}\in {0,1}), the resulting ILP is equivalent to the considered diversity problem. Indeed, the proof of such a proposition is not obvious for me as well.
Reviews: A Consistent Regularization Approach for Structured Prediction
In my view, this is a beautiful paper that will advance the field of structured prediction significantly and provides a platform for further development. Nevertheless, the paper should be better related to existing work on vector-valued regression for structured output. A recent related work is but there are others: C eline Brouard, Florence D'Alch e-Buc, Marie Szafranski. The paper is generally well written, I have only few remarks: - line 70-72: you might note already here that this amounts to a ridge regression problem in the output Hilbert space. Good to mention it already here.
Reviews: Stochastic Structured Prediction under Bandit Feedback
Summary: This paper proposes a stochastic online learning method for the task of structured prediction. In this setting, the learner doest not get the correct structured output during training. Instead, it only gets bandit feedback from the labeler. The paper first proposes an online learning algorithm that learns model parameters via stochastic gradient descent; generalizes the learning method to pair-wise comparison of structured outputs; provides an optimization approach with Cross-Entropy Minimization; and theoretically analyzes the convergence property of the optimization approach. Pros: The paper proposes an online stochastic learning algorithm for minimizing the expected loss of structured predictions; gives a method of learning from pair-wise comparisons; and theoretical analyze the convergence rate.
Reviews: Structured Prediction Theory Based on Factor Graph Complexity
The paper is well written and motivated. In particular the problem considered is relevant. On the downside there are some issues related to the interpretability of the presented results: - In Theorem 1 the generalization error is bounded in terms of the additive or multiplicative empirical margin losses. However their formulation at Eq. (5) and (6) is hard to interpret and would benefit from a comment. This is problematic since it is not clear how these quantities are related to the algorithmic approaches discussed in Sec. 5.
Reviews: Reward Augmented Maximum Likelihood for Neural Structured Prediction
The paper is a superbly written account of a simple idea that appears to work very well. The approach can straightforwardly be applied to existing max-likelihood (ML) trained models in order to in principle take into account the task reward during training and is computationally much more efficient than alternative non ML based approaches. This work risks being underappreciated as proposing but a simple addition of artificial structured-label noise, but I think the specific link with structured output task reward is sufficiently original, and the paper also uncovers important theoretical insight by revealing the formal relationship between the proposed reward augmented ML and RL-based regularized expected reward objectives. So while it works surprisingly well, you haven't yet clearly demonstrated empirically that using a truly *task-reward derived* payoff distribution is beneficial. One way to convincingly demonstrate that would be if you did your envisioned BLEU importance reweighted sampling, and were able to show that it improves the BLEU test score over your current simpler edit-distance based label noise.
Reviews: Active Nearest-Neighbor Learning in Metric Spaces
I am not qualified to evaluate this work in term of its relevance within the literature. Therefore my judgment is only about the paper content itself. Also, I have only reviewed the proofs contained in the main paper the one of Lemma A.1. Theorem 3.2 guarantees a significant improvement upon the passive learner characterized by 3.1. I find the example in line2 141-143 about the 1/sqrt(m) order very helpful and I suggest the authors to include it in the introduction as well.