Goto

Collaborating Authors

 Inductive Learning


Reviews: Exact inference in structured prediction

Neural Information Processing Systems

The paper gives a theoretical analysis of Markov random fields. The authors answer the question of when exact inference can be done exactly in a polynomial time. This is a generalization of a result of in Globerson et al. (2015) from grid graphs to general connected graphs, which is on my opinion, a non-trivial generalization. The paper is self contained and readable for the Machine Learning community, although quite technical. Indeed, I consider that it is a theoretical paper that has all the quality for a NeurIPS acceptance.


Review for NeurIPS paper: Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

While it is interesting that self-supervised methods are more invariant to occlusion, it is unclear why they wouldn't also be more invariant to the other augmentations used during training. For example, supervised learning appears more invariant to "Illumination Color" (Top-25 category) despite self-supervised learning methods using aggressive color augmentation techniques. This discrepancy is not discussed and we are left wondering what it means. Next, while the analysis of transfer performance as a function of cropped vs. original training and test datasets is interesting, it is unclear whether the results really support the authors' interpretation. They find that training and testing on the same type of images (i.e. This is to be expected, as this minimizes the domain gap between training and testing.


Review for NeurIPS paper: Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases

Neural Information Processing Systems

The topic of the paper is very relevant to the NeurIPS community, given the increased interest in understanding self-supervised learning. Reviewers have appreciated the direction the paper takes for this, ie, to study invariances learned by self-supervised learning methods, comparing them with supervised representations. There were some concerns about the interpretations of the emprical results which have been addressed in the author response. This paper takes the first and important step towards understanding the invariances in self-supervised representations and their implications on downstream tasks, and would be of interest to the NeurIPS community.


Review for NeurIPS paper: Adversarial Self-Supervised Contrastive Learning

Neural Information Processing Systems

Weaknesses: I have several major concerns on the presentations of this paper: (1) The proposed transformation smoothed inference may cause gradient obfuscation, therefore Expectation of Transformation [1] should be used to properly attack this model. Also, the details of transformation smoothed inference are missing, e.g., what transformations are used? Nonetheless, I am pretty confused about the discussion there. First of all, I am pretty confused about what comparisons are conducted there? The only useful information I found is "Compared to the semi-supervised learning methods, RoCL takes about 1/4 times faster with the same computation resources", but how about comparisons on other metrics, e.g., robustness, accuracy?


Review for NeurIPS paper: Structured Prediction for Conditional Meta-Learning

Neural Information Processing Systems

Especially, more task conditioning methods (e.g., MMAML) are considered in this paper. However, my major concern has not been addressed. The authors still ignore the discussion with multi-task learning. From my perspective, the goal for meta-learning is to generalize knowledge from previous tasks, which further benefits the training of a new task. The setting in this paper allows a new meta-testing task to access all meta-training tasks.


Review for NeurIPS paper: Structured Prediction for Conditional Meta-Learning

Neural Information Processing Systems

The reviewers agreed that this paper brings an important and relevant contribution to the NeurIPS community, and presents comprehensive experiments to validate the proposed approach. The authors are strongly encouraged to revise the submitted paper according to the feedback in the reviews, including a discussion of multi-task learning, adding the requested clarifications, and fixing typos.


Reviews: MixMatch: A Holistic Approach to Semi-Supervised Learning

Neural Information Processing Systems

Originality: 7 Quality:8 Clarity: 4 Significance:7 Mixmatch combined a lot of classical extraordinary methods that used for semi-supervised learning and achieved state-of-the-art results by a large margin across many datasets and labeled data amounts. Compared to previous method, this method is not only a simple combination of different data augmentation methods and other methods, such as exponential model average (EMA), it also explores a path to fully combine the advantages of different methods. In short, this method is of course a big step for semi-supervised learning on image classification. However, the experiments on this paper still needs to be modified to be perfect and a fair comparison with previous paper, such as Mean-Teacher. Also, some small problems need to be fixed to be finally published.


Reviews: MixMatch: A Holistic Approach to Semi-Supervised Learning

Neural Information Processing Systems

The reviewers are in consensus that this is well-written paper, which combine a number of well-studied SSL methods. The results include good performance over a number of datasets. Thus the recommendation to accept this paper.


A Probabilistic Model for Self-Supervised Learning

arXiv.org Artificial Intelligence

Self-supervised learning (SSL) aims to find meaningful representations from unlabeled data by encoding semantic similarities through data augmentations. Despite its current popularity, theoretical insights about SSL are still scarce. For example, it is not yet known whether commonly used SSL loss functions can be related to a statistical model, much in the same as OLS, generalized linear models or PCA naturally emerge as maximum likelihood estimates of an underlying generative process. In this short paper, we consider a latent variable statistical model for SSL that exhibits an interesting property: Depending on the informativeness of the data augmentations, the MLE of the model either reduces to PCA, or approaches a simple non-contrastive loss. We analyze the model and also empirically illustrate our findings.


Reviews: Joint-task Self-supervised Learning for Temporal Correspondence

Neural Information Processing Systems

The work does not include original ideas. It is exclusively a collection of previous ideas combined together in a rather classical way. Major remarks: Equation (6) makes loss non-smooth and non-differentiable. The authors do not discuss how they handle this. I assume they use the typical approach by getting the right'case' in the forward step and then doing back-prop on the fixed smooth function.