Text-based Aerial-Ground Person Retrieval

Zhou, Xinyu, Wu, Yu, Ma, Jiayao, Wang, Wenhao, Cao, Min, Ye, Mang

Nov-12-2025–arXiv.org Artificial Intelligence

This work introduces Text-based Aerial-Ground Person Retrieval (T AG-PR), which aims to retrieve person images from heterogeneous aerial and ground views with textual descriptions. Unlike traditional Text-based Person Retrieval (T -PR), which focuses solely on ground-view images, T AG-PR introduces greater practical significance and presents unique challenges due to the large viewpoint discrepancy across images. To support this task, we contribute: (1) T AG-PEDES dataset, constructed from public benchmarks with automatically generated textual descriptions, enhanced by a diversified text generation paradigm to ensure robustness under view heterogeneity; and (2) T AG-CLIP, a novel retrieval framework that addresses view heterogeneity through a hierarchically-routed mixture of experts module to learn view-specific and view-agnostic features and a viewpoint de-coupling strategy to decouple view-specific features for better cross-modal alignment. We evaluate the effectiveness of T AG-CLIP on both the proposed T AG-PEDES dataset and existing T -PR benchmarks. The dataset and code are available at https://github.com/Flame-Chasers/T

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-12-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Vision > Image Understanding (0.47)
    - Machine Learning
      - Statistical Learning (0.46)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found