Goto

Collaborating Authors

 mixup



Overleaf Example

Neural Information Processing Systems

Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a Geodesic Multi-Modal Mixup that mixes the embeddings of image and text to generate hard negative samples on the hypersphere.




Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective - Supplementary Material - Zhengzhuo Xu

Neural Information Processing Systems

One the one hand, the classification performance will be promoted. F ollow the settings of Proof A.2 and Eq.6, we can get the following relationship of the Hence we can generalize Eq.A.9 to: y According to Eq.A.11, it's easy to find a derivative zero point in range [1,C]. F ollow the Basic Setting of Proof.1, suppose It has limited contribution for the tail class' feature LT scenarios, the likelihood is consistent in the train and test set, but the prior is different. Hence, the actual optimization direction is not described as Eq.B.2 because the bias incurred by According to LemmaB.1, we immediately deduce that Bayias-compensated cross-entropy loss ensures All above re-weight methods are proven effective empirically, more or less. They propose the LDAM loss to encourage the tail classes to enjoy larger margins.


IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

Neural Information Processing Systems

Graph Neural Networks (GNNs) have shown great performance in various tasks, with the core idea of learning from data labels and aggregating messages within the neighborhood of nodes. However, the common challenges in graphs are twofold: insufficient accurate (high-quality) labels and limited neighbors for nodes, resulting in weak GNNs. Existing graph augmentation methods typically address only one of these challenges, often adding training costs or relying on oversimplified or knowledge-intensive strategies, limiting their generalization.To simultaneously address both challenges faced by graphs in a generalized way, we propose an elegant method called IntraMix. Considering the incompatibility of vanilla Mixup with the complex topology of graphs, IntraMix innovatively employs Mixup among inaccurate labeled data of the same class, generating high-quality labeled data at minimal cost. Additionally, it finds data with high confidence of being clustered into the same group as the generated data to serve as their neighbors, thereby enriching the neighborhoods of graphs. IntraMix efficiently tackles both issues faced by graphs and challenges the prior notion of the limited effectiveness of Mixup in node classification. IntraMix is a theoretically grounded plug-in-play method that can be readily applied to all GNNs. Extensive experiments demonstrate the effectiveness of IntraMix across various GNNs and datasets.


FeLMi : Few shot Learning with hard Mixup

Neural Information Processing Systems

Learning from a few examples is a challenging computer vision task. Traditionally,meta-learning-based methods have shown promise towards solving this problem.Recent approaches show benefits by learning a feature extractor on the abundantbase examples and transferring these to the fewer novel examples. However, thefinetuning stage is often prone to overfitting due to the small size of the noveldataset. To this end, we propose Few shot Learning with hard Mixup (FeLMi)using manifold mixup to synthetically generate samples that helps in mitigatingthe data scarcity issue. Different from a naïve mixup, our approach selects the hardmixup samples using an uncertainty-based criteria. To the best of our knowledge,we are the first to use hard-mixup for the few-shot learning problem.


SageMix: Saliency-Guided Mixup for Point Clouds

Neural Information Processing Systems

Data augmentation is key to improving the generalization ability of deep learning models. Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity. Also, recent studies of saliency-aware Mixup in the image domain show that preserving discriminative parts is beneficial to improving the generalization performance. However, these Mixup-based data augmentations are underexplored in 3D vision, especially in point clouds. In this paper, we propose SageMix, a saliency-guided Mixup for point clouds to preserve salient local structures.


SelecMix: Debiased Learning by Contradicting-pair Sampling

Neural Information Processing Systems

Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned examples) become a minority, whereas the other, bias-conflicting examples become prevalent. However, these approaches are sometimes difficult to train and scale to real-world data because they rely on generative models or disentangled representations. We propose an alternative based on mixup, a popular augmentation that creates convex combinations of training examples. Our method, coined SelecMix, applies mixup to contradicting pairs of examples, defined as showing either (i) the same label but dissimilar biased features, or (ii) different labels but similar biased features. Identifying such pairs requires comparing examples with respect to unknown biased features. For this, we utilize an auxiliary contrastive model with the popular heuristic that biased features are learned preferentially during training. Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.


Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention

Wang, Cong, Geng, Yizhong, Wen, Yuhua, Li, Qifei, Gao, Yingming, Wang, Ruimin, Wang, Chunfeng, Li, Hao, Li, Ya, Chen, Wei

arXiv.org Artificial Intelligence

Speech emotion recognition (SER) is an important technology in human-computer interaction. However, achieving high performance is challenging due to emotional complexity and scarce annotated data. To tackle these challenges, we propose a multi-loss learning (MLL) framework integrating an energy-adaptive mixup (EAM) method and a frame-level attention module (FLAM). The EAM method leverages SNR-based augmentation to generate diverse speech samples capturing subtle emotional variations. FLAM enhances frame-level feature extraction for multi-frame emotional cues. Our MLL strategy combines Kullback-Leibler divergence, focal, center, and supervised contrastive loss to optimize learning, address class imbalance, and improve feature separability. We evaluate our method on four widely used SER datasets: IEMOCAP, MSP-IMPROV, RAVDESS, and SAVEE. The results demonstrate our method achieves state-of-the-art performance, suggesting its effectiveness and robustness.