Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency
Ma, Yanbiao, Dai, Wei, Liu, Bowei, Chen, Jiayi, Huang, Wenke, Wan, Guancheng, Lu, Zhiwu, Yan, Junchi
–arXiv.org Artificial Intelligence
Abstract--Despite the fast progress of deep learning, one standing challenge is the gap of the observed training samples and the underlying true distribution. There are multiple reasons for the causing of this gap e.g. In the era of foundation models, we show that when leveraging the off-the-shelf (vision) foundation models (e.g., CLIP, DINOv2) for feature extraction, the geometric shapes of the resulting feature distributions exhibit remarkable transferability across domains and datasets. T o verify its practical usefulness, we embody our geometric knowledge-guided distribution calibration framework in two popular and challenging settings: federated learning and long-tailed recognition. In the federated setting, we devise a technique of acquiring the global geometric shape under privacy constraints, then leverage this knowledge to generate new samples for clients, in the aim of bridging the gap between local and global observations. In long-tailed learning, it utilizes the geometric knowledge transferred from sample-rich categories to recover the true distribution for sample-scarce tail classes. Comprehensive experiments show that our proposed geometric knowledge-guided distribution calibration effectively overcomes information deficits caused by data heterogeneity and sample imbalance, with boosted performance across benchmarks. It is often the case that the training data relied upon by models is often only a local [6], sparse [7], and biased observation [8] of the underlying ideal global data distribution. This distribution missing phenomenon manifests in various forms: in federated learning, it appears as label skew and domain skew due to data silos among clients [9], [10], [11], causing a severe misalignment between local data distributions and the global ideal distribution, thereby leading to divergent or even conflicting local optimization directions [12], [13], [14]. In long-tailed recognition, it is characterized by the extreme scarcity of samples in tail classes, preventing the model from capturing the true and complete shape of their distributions [15], [16]. Despite the differing scenarios, the essence is highly unified--models learn from incomplete information, lacking a comprehensive understanding of the overall structure of the real world. Conventional solutions, such as weighting loss functions [7], [17], [18], designing complex regularization terms [9], [14], [19], or aggregation strategies [20], [21], [22], primarily focus on post-hoc compensation at the optimization level. Y anbiao Ma and Zhiwu Lu are with the Gaoling School of Artificial Intelligence, Renmin University of China. Bowen Liu is with T singhua University.
arXiv.org Artificial Intelligence
Aug-20-2025
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Hubei Province > Wuhan (0.04)
- Shanghai > Shanghai (0.04)
- North America > United States
- California > Los Angeles County > Los Angeles (0.14)
- Asia > China
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education > Educational Setting (0.46)
- Technology: