long-tailed recognition
Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data
As datasets continue to expand in size and complexity, these models have become increasingly sophisticated, with deeper architectures and greater expressive power. Despite these advances, DNNs trained on imbalanced class distributions often exhibit a tendency to favor majority classes, leading to degraded performance on underrepresented classes [18, 39, 27, 17]. Because many real-world datasets follow long-tailed distributions in which minority classes can contain critical and informative patterns, developing methods that enable DNNs to learn effectively from imbalanced data is essential to prevent the loss of valuable information from these rare classes [26, 34, 16]. Moreover, data encountered in real-world applications are frequently multi-modal, meaning that observations originate from heterogeneous sources [6, 29, 7, 35]. To make effective use of such heterogeneous inputs, a wide range of multi-modal learning approaches have been proposed that exploit complementary information across modalities to enhance predictive performance [10, 5]. Common strategies integrate multiple modalities into a unified representation, using techniques that span from straightforward feature-level concatenation [19, 11, 12] to more sophisticated neural architectures that learn joint representations in an end-to-end manner [20, 32]. Although prior research has extensively studied class imbalance and multi-modal data separately, relatively little attentionhas beengiven to settings where bothchallenges arise si2 multaneously. Developing methods that can effectively handle long-tailed class distributions in conjunction with multi-modal inputs is therefore essential in many real-world applications. In the medical domain, for instance, datasets often contain far more samples from healthy individuals than from patients with specific conditions, while also encompassing diverse datatypes such asimagingdata(e.g., X-rays)alongsideauxiliary informationincluding demographics and clinical histories.
Breaking Long-Tailed Learning Bottlenecks: A Controllable Paradigm with Hypernetwork-Generated Diverse Experts
We generate a set of diverse expert models via hypernetworks to cover all possible distribution scenarios, and optimize the model ensemble to adapt to any test distribution. Crucially, in any distribution scenario, we can flexibly output a dedicated model solution that matches the user's preference.
Divide, Weight, and Route: Difficulty-Aware Optimization with Dynamic Expert Fusion for Long-tailed Recognition
Wei, Xiaolei, Ouyang, Yi, Ye, Haibo
Long-tailed visual recognition is challenging not only due to class imbalance but also because of varying classification difficulty across categories. Simply reweighting classes by frequency often overlooks those that are intrinsically hard to learn. To address this, we propose \textbf{DQRoute}, a modular framework that combines difficulty-aware optimization with dynamic expert collaboration. DQRoute first estimates class-wise difficulty based on prediction uncertainty and historical performance, and uses this signal to guide training with adaptive loss weighting. On the architectural side, DQRoute employs a mixture-of-experts design, where each expert specializes in a different region of the class distribution. At inference time, expert predictions are weighted by confidence scores derived from expert-specific OOD detectors, enabling input-adaptive routing without the need for a centralized router. All components are trained jointly in an end-to-end manner. Experiments on standard long-tailed benchmarks demonstrate that DQRoute significantly improves performance, particularly on rare and difficult classes, highlighting the benefit of integrating difficulty modeling with decentralized expert routing.