Goto

Collaborating Authors

 Hong, Feng


Diversified Batch Selection for Training Acceleration

arXiv.org Artificial Intelligence

The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance on additional reference models inherently limits their practical applications, when there are no such ideal models available. On the other hand, the vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner, which sacrifices the diversity and induces the redundancy. To tackle this dilemma, we propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples. Specifically, we define a novel selection objective that measures the group-wise orthogonalized representativeness to combat the redundancy issue of previous sample-wise criteria, and provide a principled selection-efficient realization. Extensive experiments across various tasks demonstrate the significant superiority of DivBS in the performance-speedup trade-off. The code is publicly available.


UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification

arXiv.org Artificial Intelligence

Vision-Language Pre-training (VLP) that utilizes the multi-modal information to promote the training efficiency and effectiveness, has achieved great success in vision recognition of natural domains and shown promise in medical imaging diagnosis for the Chest X-Rays (CXRs). However, current works mainly pay attention to the exploration on single dataset of CXRs, which locks the potential of this powerful paradigm on larger hybrid of multi-source CXRs datasets. We identify that although blending samples from the diverse sources offers the advantages to improve the model generalization, it is still challenging to maintain the consistent superiority for the task of each source due to the existing heterogeneity among sources. To handle this dilemma, we design a Conquer-and-Divide pre-training framework, termed as UniChest, aiming to make full use of the collaboration benefit of multiple sources of CXRs while reducing the negative influence of the source heterogeneity. Specially, the ``Conquer" stage in UniChest encourages the model to sufficiently capture multi-source common patterns, and the ``Divide" stage helps squeeze personalized patterns into different small experts (query networks). We conduct thorough experiments on many benchmarks, e.g., ChestX-ray14, CheXpert, Vindr-CXR, Shenzhen, Open-I and SIIM-ACR Pneumothorax, verifying the effectiveness of UniChest over a range of baselines, and release our codes and pre-training models at https://github.com/Elfenreigen/UniChest.


Optimization-based Motion Planning for Autonomous Parking Considering Dynamic Obstacle: A Hierarchical Framework

arXiv.org Artificial Intelligence

This paper introduces a hierarchical framework that integrates graph search algorithms and model predictive control to facilitate efficient parking maneuvers for Autonomous Vehicles (AVs) in constrained environments. In the high-level planning phase, the framework incorporates scenario-based hybrid A* (SHA*), an optimized variant of traditional Hybrid A*, to generate an initial path while considering static obstacles. This global path serves as an initial guess for the low-level NLP problem. In the low-level optimizing phase, a nonlinear model predictive control (NMPC)-based framework is deployed to circumvent dynamic obstacles. The performance of SHA* is empirically validated through 148 simulation scenarios, and the efficacy of the proposed hierarchical framework is demonstrated via a real-time parallel parking simulation.


Combating Representation Learning Disparity with Geometric Harmonization

arXiv.org Artificial Intelligence

Self-supervised learning (SSL) as an effective paradigm of representation learning has achieved tremendous success on various curated datasets in diverse scenarios. Nevertheless, when facing the long-tailed distribution in real-world applications, it is still hard for existing methods to capture transferable and robust representation. Conventional SSL methods, pursuing sample-level uniformity, easily leads to representation learning disparity where head classes dominate the feature regime but tail classes passively collapse. To address this problem, we propose a novel Geometric Harmonization (GH) method to encourage category-level uniformity in representation learning, which is more benign to the minority and almost does not hurt the majority under long-tailed distribution. Specially, GH measures the population statistics of the embedding space on top of self-supervised learning, and then infer an fine-grained instance-wise calibration to constrain the space expansion of head classes and avoid the passive collapse of tail classes. Our proposal does not alter the setting of SSL and can be easily integrated into existing methods in a low-cost manner. Extensive results on a range of benchmark datasets show the effectiveness of GH with high tolerance to the distribution skewness.


Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

arXiv.org Artificial Intelligence

Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature. However, few attempts take into account the coupled challenges posed by both the class imbalance and label co-occurrence, which hinders their value to boost the diagnosis on chest X-rays (CXRs) in the real-world scenarios. Besides, with the prevalence of pretraining techniques, how to incorporate these new paradigms into the current framework lacks of the systematical study. This technical report presents a brief description of our solution in the ICCV CVAMD 2023 CXR-LT Competition. We empirically explored the effectiveness for CXR diagnosis with the integration of several advanced designs about data augmentation, feature extractor, classifier design, loss function reweighting, exogenous data replenishment, etc. In addition, we improve the performance through simple test-time data augmentation and ensemble. Our framework finally achieves 0.349 mAP on the competition test set, ranking in the top five.


Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding

arXiv.org Artificial Intelligence

Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. Inspired by the self-training strategies that use an existing classifier to label the unlabeled data for retraining, we propose a cluster-guided UDA framework that labels the target domain data by clustering and combines the labeled source domain data and pseudo-labeled target domain data to train a speaker embedding network. To improve the cluster quality, we train a speaker embedding network dedicated for clustering by minimizing the contrastive center loss. The goal is to reduce the distance between an embedding and its assigned cluster center while enlarging the distance between the embedding and the other cluster centers. Using VoxCeleb2 as the source domain and CN-Celeb1 as the target domain, we demonstrate that the proposed method can achieve an equal error rate (EER) of 8.10% on the CN-Celeb1 evaluation set without using any labels from the target domain. This result outperforms the supervised baseline by 39.6% and is the state-of-the-art UDA performance on this corpus.


Long-Tailed Partial Label Learning via Dynamic Rebalancing

arXiv.org Artificial Intelligence

The remarkable success of deep learning is built on a large amount of labeled data. Data annotation in real-world scenarios often suffers from annotation ambiguity. To address annotation ambiguity, partial label learning allows multiple candidate labels to be annotated for each training instance, which can be widely used in web mining (Luo & Orabona, 2010), automatic image annotations (Zeng et al., 2013; Chen et al., 2018), ecoinformatics (Liu & Dietterich, 2012), and crowdsourcing (Gong et al., 2018). For example, a movie clip may contain several characters talking to each other, with some of them appearing in a screenshot. Although we can obtain scripts and dialogues that indicate the names of the characters, we cannot directly confirm the real name of each face in the screenshot (see Figure 7(a)). A similar scenario arises for recognizing faces from news images, where we can obtain the names of the people from the news descriptions but cannot establish a one-to-one correspondence with the face images (see Figure 7(b)). Partial label learning problem also appears in crowdsourcing, where each instance may be given multiple labels by different annotators. However, some labels may be incorrect or biased due to differences in expertise or cultural background of different annotators, so it is necessary to find the most appropriate label for each instance from candidate labels (see Figure 7(c)).


Personalized Deep Learning for Ventricular Arrhythmias Detection on Medical IoT Systems

arXiv.org Machine Learning

Life-threatening ventricular arrhythmias (VA) are the leading cause of sudden cardiac death (SCD), which is the most significant cause of natural death in the US. The implantable cardioverter defibrillator (ICD) is a small device implanted to patients under high risk of SCD as a preventive treatment. The ICD continuously monitors the intracardiac rhythm and delivers shock when detecting the life-threatening VA. Traditional methods detect VA by setting criteria on the detected rhythm. However, those methods suffer from a high inappropriate shock rate and require a regular follow-up to optimize criteria parameters for each ICD recipient. To ameliorate the challenges, we propose the personalized computing framework for deep learning based VA detection on medical IoT systems. The system consists of intracardiac and surface rhythm monitors, and the cloud platform for data uploading, diagnosis, and CNN model personalization. We equip the system with real-time inference on both intracardiac and surface rhythm monitors. To improve the detection accuracy, we enable the monitors to detect VA collaboratively by proposing the cooperative inference. We also introduce the CNN personalization for each patient based on the computing framework to tackle the unlabeled and limited rhythm data problem. When compared with the traditional detection algorithm, the proposed method achieves comparable accuracy on VA rhythm detection and 6.6% reduction in inappropriate shock rate, while the average inference latency is kept at 71ms.