Supervised Learning
Reviews: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
As clearly indicated in the title, this paper submission is an extension of the PointNet work of [19], to appear at CVPR 2017. The goal is to classify and segment (3D) point clouds. Novel contributions over [19] are the use of a hierarchical network, leveraging neighbourhoods at different scales, and a mechanism to deal with varying sampling densities, effectively generating receptive fields that vary in a data dependent manner. All this leads to state-of-the-art results. PointNet seems an important extension over PointNet, in that it allows to properly exploit local spatial information.
Reviews: Learning latent variable structured prediction models with Gaussian perturbations
UPDATE: Upon reading the author response, I have decided to leave my review unchanged. The topic of this work is of interest to the community, and it can make a decent publication. I am still concerned about novelty. While the author response argued that similar increments have been done in past works, I feel that there are some significant differences between the current case and the mentioned works. Accepting the paper is still a reasonable decision in my humble opinion.
Reviews: A Structured Prediction Approach for Label Ranking
This paper presents an interesting approach to the label ranking problem, by first casting it as a Structured Prediction problem that can be optimized using a surrogate least square methodology, and then demonstrating an embedding representation that captures a couple of common ranking loss functions -- most notable being the Kendall-Tau distance. Overall I liked the paper and found a decent mix of method, theory and experiments (though I would have liked to see more convincing experimentation as further detailed below). In particular I liked the demonstration of the Kendall tau distance and Hamming distances to be representable in this embedding formulation/ That said I had a few concerns with this work as well: - Specifically the empirical results were not very convincing. While this may not have been a problem for a theory-first paper, part of the appeal of an approach like this it is supposed to work in practice. Unfortunately with the current (some what limited) set of experiments I am not entirely convinced. For example: This only looked at a couple of very specific (and not particularly common loss functions) with the evals only measuring Kendall Tau.
Conformal Structured Prediction
Zhang, Botong, Li, Shuo, Bastani, Osbert
Conformal prediction has recently emerged as a promising strategy for quantifying the uncertainty of a predictive model; these algorithms modify the model to output sets of labels that are guaranteed to contain the true label with high probability. However, existing conformal prediction algorithms have largely targeted classification and regression settings, where the structure of the prediction set has a simple form as a level set of the scoring function. However, for complex structured outputs such as text generation, these prediction sets might include a large number of labels and therefore be hard for users to interpret. In this paper, we propose a general framework for conformal prediction in the structured prediction setting, that modifies existing conformal prediction algorithms to output structured prediction sets that implicitly represent sets of labels. In addition, we demonstrate how our approach can be applied in domains where the prediction sets can be represented as a set of nodes in a directed acyclic graph; for instance, for hierarchical labels such as image classification, a prediction set might be a small subset of coarse labels implicitly representing the prediction set of all their more fine-descendants. We demonstrate how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.
Reviews: Deep Structured Prediction with Nonlinear Output Transformations
This paper studies the problem of training deep structured models (models where the dependencies between the output variables are explicitly modelled and some components are modelled via neural networks). The key idea of this paper is to give up the standard modelling assumption of structured prediction: the score (or the energy) function is the sum of summands (potentials). Instead of using the sum the paper puts an arbitrary non-linear (a neural network) transformation on top of the potentials. The paper develops an inference (MAP prediction) technique for such models which is based on Lagrangian decomposition (often referred to as dual decomposition, see details below). The training of the model is done by combining this inference technique with the standard Structure SVM (SSVM) objective.
Reviews: On Structured Prediction Theory with Calibrated Convex Surrogate Losses
The paper examines consistency of surrogate losses for multiclass prediction. The authors present their results using the formalism of structured prediction. Alas, there is no direct connection or exploitation of the separability of structured prediction losses. The paper is overly notated and variables are frequently overloaded. I got the feeling that the authors are trying to look mathematically fancy at the expense of readability.
Reviews: A Smoother Way to Train Structured Prediction Models
Overview: This paper proposes an accelerated variance-reduction algorithm for training structured predictors. In this approach the training objective is augmented with a proximal term anchored with a momentum point (eq (3)), the loss is smoothed using Nesterov's smoothing method (adding entropy or L2 to the dual), and a linear-rate solver (SVRG) is applied to the resulting objective in the inner loop. This achieves accelerated convergence rates for training. Comments: * I think that the connection to structured prediction is somewhat weak. In particular, the analysis uses the finite sum and smoothability of the training objective.
Reviews: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
This paper studies the property of permutation invariance in the context of structured prediction. The paper argues that in many applications permutation invariance is a desirable property of a solution and it makes sense to design the model such that it is satisfied by construction rather than to rely on learning to get this property. The paper proposes a model to represent permutation invariant functions and claims that this model is a universal approximator within this family. The proposed method is evaluated on a synthetic and a real task (labelling of scene graphs). 1) Most importantly, I think that in the current form the proof of the main theoretical result (Theorem 1) is wrong. The problem is with the reverse direction proving that any permutation invariant function can be represented in the form of Theorem 1. Specifically, Lines 142-159 construct matrix M which aggregates information about the graph edges.