Goto

Collaborating Authors

 Inductive Learning


Reviews: Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

Neural Information Processing Systems

This paper proposes a systematic evaluation of SSL methods, studies the pitfalls of current approaches to evaluation, and, conducts experiments to show the impact of rigorous validation on kinds of conclusions we can draw from these methods. I really like the paper and read it when it appeared on arXiv back in April. In many places we are lacking these kind of systematic approaches to robust evaluations and it's refreshing to see more of these papers emerging that question the foundation of our validation methodologies and provide a coherent evaluation. Suggestions for improvements: - The paper mainly deals with two image categorisation datasets. While these methods have been studied in many recent SSL papers, they also have their own limitations, some of which is mentioned in the paper. But the main problem is that it restricts them to a single domain which is image categorisation.


Reviews: The Pessimistic Limits and Possibilities of Margin-based Losses in Semi-supervised Learning

Neural Information Processing Systems

Overview and Recommendation: Many popular binary classifiers are defined by convex margin-based surrogate losses such as SVMs and Logistic regression. Designing a semi-supervised learning algorithm for these classifiers, that is guaranteed to improve upon the "lazy" approach of throwing away the unlabeled data and just using the labeled data while training, is of considerable interest, because of the time-consuming experimentation that the use of SSL currently requires. This paper analyzes this problem and the results presented in the paper are primarily of theoretical interest. I had great difficulty in rating the significance of this work, therefore my own confidence rating is only 3. The proofs of the theorems use elementary steps. I checked them in detail and they are correct, but, the significance of the theorems themselves was hard to measure.


Reviews: Good Semi-supervised Learning That Requires a Bad GAN

Neural Information Processing Systems

After reading the rebuttal I changed my score to 7. Overall it is an interesting paper with an interesting idea. Although the theoretical contributions are emphasized I find the empirical findings more appealing. The theory presented in the paper is not convincing (input versus feature, convexity etc). I think the link to classical semi-supervised learning and the cluster assumption should be emphasized, and the * low density assumption on the boundary* as explained in this paper: Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien http://citeseerx.ist.psu.edu/viewdoc/download?doi 10.1.1.76.5826&rep rep1&type pdf I am changing my review to 7, and I hope that the authors will put their contribution in the context of known work in semi-supervised learning, that the boundary of separation should lie in the low density regions . This will put the paper better in context.


Reviews: A Structured Prediction Approach for Label Ranking

Neural Information Processing Systems

This paper presents an interesting approach to the label ranking problem, by first casting it as a Structured Prediction problem that can be optimized using a surrogate least square methodology, and then demonstrating an embedding representation that captures a couple of common ranking loss functions -- most notable being the Kendall-Tau distance. Overall I liked the paper and found a decent mix of method, theory and experiments (though I would have liked to see more convincing experimentation as further detailed below). In particular I liked the demonstration of the Kendall tau distance and Hamming distances to be representable in this embedding formulation/ That said I had a few concerns with this work as well: - Specifically the empirical results were not very convincing. While this may not have been a problem for a theory-first paper, part of the appeal of an approach like this it is supposed to work in practice. Unfortunately with the current (some what limited) set of experiments I am not entirely convinced. For example: This only looked at a couple of very specific (and not particularly common loss functions) with the evals only measuring Kendall Tau.


Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation

arXiv.org Artificial Intelligence

Atrial fibrillation is a commonly encountered clinical arrhythmia associated with stroke and increased mortality. Since professional medical knowledge is required for annotation, exploiting a large corpus of ECGs to develop accurate supervised learning-based atrial fibrillation algorithms remains challenging. Self-supervised learning (SSL) is a promising recipe for generalized ECG representation learning, eliminating the dependence on expensive labeling. However, without well-designed incorporations of knowledge related to atrial fibrillation, existing SSL approaches typically suffer from unsatisfactory capture of robust ECG representations. In this paper, we propose an inter-intra period-aware ECG representation learning approach. Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations, aiming to learn the single-period stable morphology representation while retaining crucial interperiod features. After further fine-tuning, our approach demonstrates remarkable AUC performances on the BTCH dataset, \textit{i.e.}, 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection. On commonly used benchmarks of CinC2017 and CPSC2021, the generalization capability and effectiveness of our methodology are substantiated with competitive results.


Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects

arXiv.org Artificial Intelligence

The Abstraction and Reasoning Corpus (ARC) is a popular benchmark focused on visual reasoning in the evaluation of Artificial Intelligence systems. In its original framing, an ARC task requires solving a program synthesis problem over small 2D images using a few input-output training pairs. In this work, we adopt the recently popular data-driven approach to the ARC and ask whether a Vision Transformer (ViT) can learn the implicit mapping, from input image to output image, that underlies the task. We show that a ViT -- otherwise a state-of-the-art model for images -- fails dramatically on most ARC tasks even when trained on one million examples per task. This points to an inherent representational deficiency of the ViT architecture that makes it incapable of uncovering the simple structured mappings underlying the ARC tasks. Building on these insights, we propose ViTARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities required by the ARC. Specifically, we use a pixel-level input representation, design a spatially-aware tokenization scheme, and introduce a novel object-based positional encoding that leverages automatic segmentation, among other enhancements. Our task-specific ViTARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks strictly through supervised learning from input-output grids. This calls attention to the importance of imbuing the powerful (Vision) Transformer with the correct inductive biases for abstract visual reasoning that are critical even when the training data is plentiful and the mapping is noise-free. Hence, ViTARC provides a strong foundation for future research in visual reasoning using transformer-based architectures.


Conformal Structured Prediction

arXiv.org Artificial Intelligence

Conformal prediction has recently emerged as a promising strategy for quantifying the uncertainty of a predictive model; these algorithms modify the model to output sets of labels that are guaranteed to contain the true label with high probability. However, existing conformal prediction algorithms have largely targeted classification and regression settings, where the structure of the prediction set has a simple form as a level set of the scoring function. However, for complex structured outputs such as text generation, these prediction sets might include a large number of labels and therefore be hard for users to interpret. In this paper, we propose a general framework for conformal prediction in the structured prediction setting, that modifies existing conformal prediction algorithms to output structured prediction sets that implicitly represent sets of labels. In addition, we demonstrate how our approach can be applied in domains where the prediction sets can be represented as a set of nodes in a directed acyclic graph; for instance, for hierarchical labels such as image classification, a prediction set might be a small subset of coarse labels implicitly representing the prediction set of all their more fine-descendants. We demonstrate how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.


Failure-Proof Non-Contrastive Self-Supervised Learning

arXiv.org Machine Learning

We identify sufficient conditions to avoid known failure modes, including representation, dimensional, cluster and intracluster collapses, occurring in non-contrastive self-supervised learning. Based on these findings, we propose a principled design for the projector and loss function. We theoretically demonstrate that this design introduces an inductive bias that promotes learning representations that are both decorrelated and clustered without explicit enforcing these properties and leading to improved generalization. To the best of our knowledge, this is the first solution that achieves robust training with respect to these failure modes while guaranteeing enhanced generalization performance in downstream tasks. We validate our theoretical findings on image datasets including SVHN, CIFAR10, CIFAR100 and ImageNet-100, and show that our solution, dubbed FALCON, outperforms existing feature decorrelation and cluster-based self-supervised learning methods in terms of generalization to clustering and linear classification tasks.


Enhancing Graph Self-Supervised Learning with Graph Interplay

arXiv.org Machine Learning

Graph self-supervised learning (GSSL) has emerged as a compelling framework for extracting informative representations from graph-structured data without extensive reliance on labeled inputs. In this study, we introduce Graph Interplay (GIP), an innovative and versatile approach that significantly enhances the performance equipped with various existing GSSL methods. To this end, GIP advocates direct graph-level communications by introducing random inter-graph edges within standard batches. Against GIP's simplicity, we further theoretically show that \textsc{GIP} essentially performs a principled manifold separation via combining inter-graph message passing and GSSL, bringing about more structured embedding manifolds and thus benefits a series of downstream tasks. Our empirical study demonstrates that GIP surpasses the performance of prevailing GSSL methods across multiple benchmarks by significant margins, highlighting its potential as a breakthrough approach. Besides, GIP can be readily integrated into a series of GSSL methods and consistently offers additional performance gain. This advancement not only amplifies the capability of GSSL but also potentially sets the stage for a novel graph learning paradigm in a broader sense.


Reviews: On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Neural Information Processing Systems

The paper make use of (relatively) recent advances in complexity theory to show that of many common learning problems do not allow subquadratic time learning algorithms (given the veracity of the "Strong Exponential Time Hypothesis"). I appreciate that the authors do not oversell their results: They clearly state that they provide a worst-case analysis. Also, the results are not surprising. For instance, finding the exact solution of any kernel method requires the computation of the full kernel matrix, which is already quadratic in number of training examples. Reducing this computation time would imply that one can compute an approximation of the exact solution without computing the full kernel matrix, which is intuitively unlikely, unless he makes extra assumptions on the problem structure (i.e., the nature of the data-generating distribution).