Inductive Learning
Reviews: Reward Augmented Maximum Likelihood for Neural Structured Prediction
The paper is a superbly written account of a simple idea that appears to work very well. The approach can straightforwardly be applied to existing max-likelihood (ML) trained models in order to in principle take into account the task reward during training and is computationally much more efficient than alternative non ML based approaches. This work risks being underappreciated as proposing but a simple addition of artificial structured-label noise, but I think the specific link with structured output task reward is sufficiently original, and the paper also uncovers important theoretical insight by revealing the formal relationship between the proposed reward augmented ML and RL-based regularized expected reward objectives. So while it works surprisingly well, you haven't yet clearly demonstrated empirically that using a truly *task-reward derived* payoff distribution is beneficial. One way to convincingly demonstrate that would be if you did your envisioned BLEU importance reweighted sampling, and were able to show that it improves the BLEU test score over your current simpler edit-distance based label noise.
Reviews: Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images
Overall the paper presents a novel application of CNNs to co-evolution Protein Contact Prediction (PCP) with state-of-the-art results. The paper is a strong contribution to the field of coevolution PCP. The method has the potential to improve ab initio structure prediction accuracy, which could have wide ranging impacts on studying proteins without solved structural complexes. The paper itself contains flaws in the experimental setup, how the methods are described, and how the results are presented. Key issues: - Experimental setup: not enough unique training examples are used to compare to existing methods. More than 100K examples exist for this specific task, but only 307 are used.
Bounding training data reconstruction in DP-SGD
Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks. By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples. Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against a training data reconstruction, then utility of private models can be improved because less noise is required to protect against these more ambitious attacks. We investigate this question further in the context of DP-SGD, a standard algorithm for private deep learning, and provide an upper bound on the success of any reconstruction attack against DP-SGD together with an attack that empirically matches the predictions of our bound. Together, these two results open the door to fine-grained investigations on how to set the privacy parameters of DP-SGD in practice to protect against reconstruction attacks.
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a 0-shot way, similar to Contrastive Language-Image Pre-training (CLIP) and Locked-image Tuning (LiT) that have recently gained considerable attention. Classical contrastive training employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more non-binary' treatment. To address this, we propose a new contrastive loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to transfer the structure of the embedding space from one modality to another.
Suggesting Variable Order for Cylindrical Algebraic Decomposition via Reinforcement Learning
Cylindrical Algebraic Decomposition (CAD) is one of the pillar algorithms of symbolic computation, and its worst-case complexity is double exponential to the number of variables. Researchers found that variable order dramatically affects efficiency and proposed various heuristics. The existing learning-based methods are all supervised learning methods that cannot cope with diverse polynomial sets.This paper proposes two Reinforcement Learning (RL) approaches combined with Graph Neural Networks (GNN) for Suggesting Variable Order (SVO). One is GRL-SVO(UP), a branching heuristic integrated with CAD. The other is GRL-SVO(NUP), a fast heuristic providing a total order directly.
Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers.
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domain with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space.
Anchor Data Augmentation
We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality. Contrary to the current state-of-the-art solutions that rely on modifications of Mixup algorithm, we extend the recently proposed distributionally robust Anchor regression (AR) method for data augmentation. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks.
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of unmasked tokens.
Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning
Graph-based semi-supervised learning (GSSL) serves as a powerful tool to model the underlying manifold structures of samples in high-dimensional spaces. It involves two phases: constructing an affinity graph from available data and inferring labels for unlabeled nodes on this graph. While numerous algorithms have been developed for label inference, the crucial graph construction phase has received comparatively less attention, despite its significant influence on the subsequent phase. In this paper, we present an optimal asymmetric graph structure for the label inference phase with theoretical motivations. Unlike existing graph construction methods, we differentiate the distinct roles that labeled nodes and unlabeled nodes could play.