Goto

Collaborating Authors

 consistency training


Unsupervised Data Augmentation for Consistency Training

Neural Information Processing Systems

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.


ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding

Le, Tuan-Dung, Haddadan, Shohreh, Thieu, Thanh Q.

arXiv.org Artificial Intelligence

Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enhance model understanding of code hierarchies and synonyms, they often overlook the pervasive use of medical acronyms in clinical notes, a key factor in ICD code inference. To address this gap, we propose a novel effective data augmentation technique that leverages large language models to expand medical acronyms, allowing models to be trained on their full form representations. Moreover, we incorporate consistency training to regularize predictions by enforcing agreement between the original and augmented documents. Extensive experiments on the MIMIC-III dataset demonstrate that our approach, ACE-ICD establishes new state-of-the-art performance across multiple settings, including common codes, rare codes, and full-code assignments. Our code is publicly available.


CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

Song, Wenxuan, Chen, Jiayi, Ding, Pengxiang, Huang, Yuxin, Zhao, Han, Wang, Donglin, Li, Haoang

arXiv.org Artificial Intelligence

In recent years, Vision-Language-Action (VLA) models have become a vital research direction in robotics due to their impressive multimodal understanding and generalization capabilities. Despite the progress, their practical deployment is severely constrained by inference speed bottlenecks, particularly in high-frequency and dexterous manipulation tasks. While recent studies have explored Jacobi decoding as a more efficient alternative to traditional autoregressive decoding, its practical benefits are marginal due to the lengthy iterations. To address it, we introduce consistency distillation training to predict multiple correct action tokens in each iteration, thereby achieving acceleration. Besides, we design mixed-label supervision to mitigate the error accumulation during distillation. Although distillation brings acceptable speedup, we identify that certain inefficient iterations remain a critical bottleneck. To tackle this, we propose an early-exit decoding strategy that moderately relaxes convergence conditions, which further improves average inference efficiency. Experimental results show that the proposed method achieves more than 4 times inference acceleration across different baselines while maintaining high task success rates in both simulated and real-world robot tasks. These experiments validate that our approach provides an efficient and general paradigm for accelerating multimodal decision-making in robotics. Our project page is available at https://irpn-eai.github.io/CEED-VLA/.


Unsupervised Data Augmentation for Consistency Training

Neural Information Processing Systems

Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples.


Training Consistency Models with Variational Noise Coupling

Silvestri, Gianluigi, Ambrogioni, Luca, Lai, Chieh-Hsin, Takida, Yuhta, Mitsufuji, Yuki

arXiv.org Artificial Intelligence

Consistency Training (CT) has recently emerged as a promising alternative to diffusion models, achieving competitive performance in image generation tasks. However, non-distillation consistency training often suffers from high variance and instability, and analyzing and improving its training dynamics is an active area of research. In this work, we propose a novel CT training approach based on the Flow Matching framework. Our main contribution is a trained noise-coupling scheme inspired by the architecture of Variational Autoencoders (VAE). By training a data-dependent noise emission model implemented as an encoder architecture, our method can indirectly learn the geometry of the noise-to-data mapping, which is instead fixed by the choice of the forward process in classical CT. Empirical results across diverse image datasets show significant generative improvements, with our model outperforming baselines and achieving the state-of-the-art (SoTA) non-distillation CT FID on CIFAR-10, and attaining FID on par with SoTA on ImageNet at $64 \times 64$ resolution in 2-step generation. Our code is available at https://github.com/sony/vct .


Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis

Wu, Chengyan, Ma, Bolei, Deng, Ningyuan, He, Yanqing, Xue, Yun

arXiv.org Artificial Intelligence

Aspect-based sentiment analysis (ABSA) is a sequence labeling task that has garnered growing research interest in multilingual contexts. However, recent studies lack more robust feature alignment and finer aspect-level alignment. In this paper, we propose a novel framework, Multi-Scale and Multi-Objective optimization (MSMO) for cross-lingual ABSA. During multi-scale alignment, we achieve cross-lingual sentence-level and aspect-level alignment, aligning features of aspect terms in different contextual environments. Specifically, we introduce code-switched bilingual sentences into the language discriminator and consistency training modules to enhance the model's robustness. During multi-objective optimization, we design two optimization objectives: supervised training and consistency training, aiming to enhance cross-lingual semantic alignment. To further improve model performance, we incorporate distilled knowledge of the target language into the model. Results show that MSMO significantly enhances cross-lingual ABSA by achieving state-of-the-art performance across multiple languages and models.


Consistency Training with Physical Constraints

Chang, Che-Chia, Dai, Chen-Yang, Lin, Te-Sheng, Lai, Ming-Chih, Lai, Chieh-Hsin

arXiv.org Artificial Intelligence

We propose a physics-aware Consistency Training (CT) (Song et al., 2023) method that accelerates sampling in Diffusion Models with physical constraints. Experiments on toy examples show that our method generates samples in a single step while adhering to the imposed constraints. This approach has the potential to efficiently solve partial differential equations (PDEs) using deep generative modeling. Diffusion models (Sohl-Dickstein et al., 2015; Song & Ermon, 2019; Ho et al., 2020; Song et al., 2021b) have achieved significant success in high-dimensional data generation. Recent efforts have focused on adapting diffusion models to generate samples that satisfy physical constraints (Yuan et al., 2023; Mazé & Ahmed, 2023; Shu et al., 2023; Jacobsen et al., 2024; Bastek et al., 2024).


Improved Training Technique for Latent Consistency Models

Dao, Quan, Doan, Khanh, Liu, Di, Le, Trung, Metaxas, Dimitris

arXiv.org Artificial Intelligence

Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-$c$ scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/


Review for NeurIPS paper: Unsupervised Data Augmentation for Consistency Training

Neural Information Processing Systems

Additional Feedback: The main comment I have regarding the paper is that the authors do not provide adequate justification as to why the advanced data augmentation work compared to the simple ones and when to apply them. This same intuition can be applied for other semi-supervised methods like nearest neighbor and label propagation. These methods will assign the same labels to unlabeled data examples within its component in a graph. This is intuitive but does not explain why the noise from the advanced data augmentation methods are better for semi-supervised learning or provide guarantees for when they work. I acknowledge that I read the rebuttal and thank the authors for providing explanations to the questions and concerns I had.


Stable Consistency Tuning: Understanding and Improving Consistency Models

Wang, Fu-Yun, Geng, Zhengyang, Li, Hongsheng

arXiv.org Artificial Intelligence

Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.