Inductive Learning
Review for NeurIPS paper: Fairness constraints can help exact inference in structured prediction
My biggest concern is with the way that \epsilon_1 behaves depending on n. Firstly, it seems the choice of the value -n for \rho is arbitrary (with the choice being repercuted in the definition of \epsilon), and this should be discussed more clearly in the text. Next, it is not clear to me why the choice \rho -n is the best. Does it optimize \epsilon_1 in some way? Furthermore, as n tends to \infty, it seems that \epsilon_1 does NOT tend to infinity.
Review for NeurIPS paper: Fairness constraints can help exact inference in structured prediction
This paper is about structures output predictions analysis under'fairness' constraint. This paper shows that constraints relative to fairness can help to increases accuracies. Fairness is one of the notion whose importance is rising in our community, and this paper give interesting insights about it. One of the main issue raised by one of the reviewer that pleads for non acceptance is the "vagueness" of the definition of fairness here. I personally think that this issue should not be taken to much into account here, there is still in our community some "vagueness" according to what the good definition should be.
Reviews: Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
The authors present a way of self-supervised auxiliary learning in which the images in the training set are rotated with 4 different rotations, and the neural network has to predict the type of rotation. The authors show with various experiments that this type of SSL increases the robustness against all kinds of perturbations, ranging from adversarial attacks to motion blur and fog. In addition, the outputs indicating the rotation can be used for detecting outliers. The article makes a good case for both contributions. One main remark is that the title of the article talks about uncertainty estimation, while the experiments focus on outlier detection. These two tasks are related but not identical.
Reviews: Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
This paper received mixed reviews. All reviewers found the empirical findings in the paper to be very interesting. The main concern from reviewers was about the lack of theoretical justification for the findings. However, many empirical results precede theoretical results, and this paper's empirical results are interesting in its own right. The area chair has read the paper in detail. The paper is well written, and provides important empirical analysis for two timely questions in the field today: model robustness and self-supervised learning.
Reviews: Fast and Accurate Stochastic Gradient Estimation
Summary: This paper develops a new method for adaptively sampling training examples during stochastic optimization. It is known that the optimal distribution that minimizes the nuclear norm of the covariance of the gradient estimate is one where the the probability of sampling an example is proportional to the magnitude of the gradient of the loss on that example. Sampling according to this distribution is of course impractical, because computing this distribution is as expensive as computing the full gradient and requires O(N) time per iteration. To get around this, prior work either maintains a fixed distribution across all iterations or makes strong assumptions on the distribution of gradients of different training examples (e.g.: the gradients of training examples of the same class are similar). This paper proposes a method that can adaptively sample from different distributions every iteration and requires little assumptions on the distribution of gradients, and yet requires the same per-iteration cost as SGD.
Reviews: Fast and Accurate Stochastic Gradient Estimation
This paper received extensive discussion by the reviewers, the meta-reviewer, the SPC, etc. Here is a meta-review summary. The paper considers the problem of adaptively sampling training examples in stochastic optimization, and it shows that it is possible to do so without a per-iteration cost of O(N). This is of interest by itself, since one typically thinks that such sampling requires maintaining a distribution over training examples, which requires O(N) in every iteration, i.e., which is as expensive as full-batch gradient descent. A second aspect of this paper is that the mechanism by which the authors accomplish this is to use LSH, which is a sketching method usually used for nearest neighbor search.
Review for NeurIPS paper: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
Weaknesses: My central concern for this paper is the misalignment between the motivation and methodology. As motivation, the authors argue that self-supervised CV and **NLP** "algorithms are not effective for tabular data." The proposed model, though, is effectively the binary masked language model whose variants pervade self-supervised NLP research (e.g. Granted, instead of masking words, the proposed models are masking tabular values, but this is performing a very similar pretext task. In fact, there is concurrent work that learns tabular representations using a BERT model [1].
Reviews: Bridging Machine Learning and Logical Reasoning by Abductive Learning
Still, if you can do some version of the Mayan hieroglyphics, or work that example into the introduction, it would improve the paper even more. They restrict themselves to classification problems, i.e., a mapping from perceptual input to {0,1}; the discrete symbols output by the perception model act as latent variables sitting in between the input and the binary decision. Their approach is to alternate between (1) inferring a logic program consistent with the training examples, conditioned on the output of the perception model, and (2) training the perception model to predict the latent discrete symbols. Because the perception model may be unreliable, particularly early on in training, the logic program is allowed to revise or abduce the outputs of perception. The problem they pose -- integrating learned perception with learned symbolic reasoning -- is eminently important.
Reviews: Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs
The paper discusses how to solve semi-supervised learning with multi-layer graphs. For single-layer graphs, this is achieved by label regression regularized by Laplacian matrix. For multi-layer, the paper argues that it should use a power mean Laplacian instead of the plain additive sum of Laplacians in each layer. This generalizes prior work including using the harmonic means. Some theoretical discussions follow under the assumptions from Multilayer Stochastic Block Model (MSBM), showing that specificity and robustness trade-offs can be achieved by adjusting the power parameter.
Reviews: Generalized Matrix Means for Semi-Supervised Learning with Multilayer Graphs
This paper makes a contribution toward the theory of semi-supervised learning for graph classification, as well as an efficient algorithm for computing the proposed classifier. This is an interesting problem and the reviewers agree the contribution is at least incremental. I suggest the authors carefully revise the paper to address reviewer concerns to get the maximum impact.