BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation
Bong, Haechan Mark, de Azambuja, Ricardo, Beltrame, Giovanni
–arXiv.org Artificial Intelligence
Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.
arXiv.org Artificial Intelligence
Oct-16-2024
- Country:
- Europe (0.68)
- North America > United States (0.28)
- South America > Brazil
- Rio Grande do Sul (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Hardware (0.38)
- Technology:
- Information Technology
- Architecture > Real Time Systems (1.00)
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (0.69)
- Representation & Reasoning > Optimization (0.46)
- Robots (1.00)
- Sensing and Signal Processing > Image Processing (0.90)
- Information Technology