STAMP: Scalable Task And Model-agnostic Collaborative Perception

Gao, Xiangbo, Xu, Runsheng, Li, Jiachen, Wang, Ziran, Fan, Zhiwen, Tu, Zhengzhong

arXiv.org Artificial Intelligence 

Perception is a crucial component of autonomous driving systems. However, single-agent setups often face limitations due to sensor constraints, especially under challenging conditions like severe occlusion, adverse weather, and long-range object detection. Multi-agent collaborative perception (CP) offers a promising solution that enables communication and information sharing between connected vehicles. Yet, the heterogeneity among agents--in terms of sensors, models, and tasks--significantly hinders effective and efficient cross-agent collaboration. To address these challenges, we propose STAMP, a scalable task-and model-agnostic collaborative perception framework tailored for heterogeneous agents. STAMP utilizes lightweight adapter-reverter pairs to transform Bird's Eye View (BEV) features between agent-specific domains and a shared protocol domain, facilitating efficient feature sharing and fusion while minimizing computational overhead. Moreover, our approach enhances scalability, preserves model security, and accommodates a diverse range of agents. Extensive experiments on both simulated (OPV2V) and real-world (V2V4Real) datasets demonstrate that STAMP achieves comparable or superior accuracy to state-of-the-art models with significantly reduced computational costs. As the first-of-its-kind task-and model-agnostic collaborative perception framework, STAMP aims to advance research in scalable and secure mobility systems, bringing us closer to Level 5 autonomy. Our project page is at https://xiangbogaobarry.github.io/STAMP Multi-agent collaborative perception (CP) (Bai et al., 2022b; Han et al., 2023; Liu et al., 2023) has emerged as a promising solution for autonomous systems by leveraging communication among multiple connected and automated agents. It enables agents--such as vehicles, infrastructure, or even pedestrians--to share sensory and perceptual information, providing a more comprehensive view of the surrounding environment to enhance overall perception capabilities. Despite its potential, CP faces significant challenges, particularly when dealing with heterogeneous agents that defer in input modalities, model parameters, architectures, or learning objectives.