rsis
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Luo, Junwei, Zhang, Yingying, Yang, Xue, Wu, Kang, Zhu, Qi, Liang, Lei, Chen, Jingdong, Li, Yansheng
Efficient vision-language understanding of large Remote Sensing Images (RSIs) is meaningful but challenging. Current Large Vision-Language Models (LVLMs) typically employ limited pre-defined grids to process images, leading to information loss when handling gigapixel RSIs. Conversely, using unlimited grids significantly increases computational costs. To preserve image details while reducing computational complexity, we propose a text-guided token pruning method with Dynamic Image Pyramid (DIP) integration. Our method introduces: (i) a Region Focus Module (RFM) that leverages text-aware region localization capability to identify critical vision tokens, and (ii) a coarse-to-fine image tile selection and vision token pruning strategy based on DIP, which is guided by RFM outputs and avoids directly processing the entire large imagery. Additionally, existing benchmarks for evaluating LVLMs' perception ability on large RSI suffer from limited question diversity and constrained image sizes. We construct a new benchmark named LRS-VQA, which contains 7,333 QA pairs across 8 categories, with image length up to 27,328 pixels. Our method outperforms existing high-resolution strategies on four datasets using the same data. Moreover, compared to existing token reduction methods, our approach demonstrates higher efficiency under high-resolution settings. Dataset and code are in https://github.com/VisionXLab/LRS-VQA.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Zhao, Danpei, Yuan, Bo, Chen, Ziqiang, Li, Tian, Liu, Zhuoran, Li, Wentao, Gao, Yue
Current remote-sensing interpretation models often focus on a single task such as detection, segmentation, or caption. However, the task-specific designed models are unattainable to achieve the comprehensive multi-level interpretation of images. The field also lacks support for multi-task joint interpretation datasets. In this paper, we propose Panoptic Perception, a novel task and a new fine-grained dataset (FineGrip) to achieve a more thorough and universal interpretation for RSIs. The new task, 1) integrates pixel-level, instance-level, and image-level information for universal image perception, 2) captures image information from coarse to fine granularity, achieving deeper scene understanding and description, and 3) enables various independent tasks to complement and enhance each other through multi-task learning. By emphasizing multi-task interactions and the consistency of perception results, this task enables the simultaneous processing of fine-grained foreground instance segmentation, background semantic segmentation, and global fine-grained image captioning. Concretely, the FineGrip dataset includes 2,649 remote sensing images, 12,054 fine-grained instance segmentation masks belonging to 20 foreground things categories, 7,599 background semantic masks for 5 stuff classes and 13,245 captioning sentences. Furthermore, we propose a joint optimization-based panoptic perception model. Experimental results on FineGrip demonstrate the feasibility of the panoptic perception task and the beneficial effect of multi-task joint optimization on individual tasks. The dataset will be publicly available.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (3 more...)
- Government > Military (1.00)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.84)
- Aerospace & Defense (0.68)
Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery
Li, Yansheng, Luo, Junwei, Zhang, Yongjun, Tan, Yihua, Yu, Jin-Gang, Bai, Song
Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with large-size VHR RSIs limits the deep learning algorithms' performance on bridge detection. Due to the limitation of GPU memory in tackling large-size images, deep learning-based object detection methods commonly adopt the cropping strategy, which inevitably results in label fragmentation and discontinuous prediction. To ameliorate the scarcity of datasets, this paper proposes a large-scale dataset named GLH-Bridge comprising 6,000 VHR RSIs sampled from diverse geographic locations across the globe. These images encompass a wide range of sizes, varying from 2,048*2,048 to 16,38*16,384 pixels, and collectively feature 59,737 bridges. Furthermore, we present an efficient network for holistic bridge detection (HBD-Net) in large-size RSIs. The HBD-Net presents a separate detector-based feature fusion (SDFF) architecture and is optimized via a shape-sensitive sample re-weighting (SSRW) strategy. Based on the proposed GLH-Bridge dataset, we establish a bridge detection benchmark including the OBB and HBB tasks, and validate the effectiveness of the proposed HBD-Net. Additionally, cross-dataset generalization experiments on two publicly available datasets illustrate the strong generalization capability of the GLH-Bridge dataset.
- North America > United States (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- South America (0.04)
- (5 more...)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.62)
- Health & Medicine (0.46)
- Transportation > Ground (0.46)
Whatever Happened to Carpal Tunnel Syndrome?
This article was featured in One Story to Read Today, a newsletter in which our editors recommend a single must-read from The Atlantic, Monday through Friday. Diana Henriques was first stricken in late 1996. A business reporter for The New York Times, she was in the midst of a punishing effort to bring a reporting project to fruition. Then one morning she awoke to find herself incapable of pinching her contact lens between her thumb and forefinger. Henriques's hands were soon cursed with numbness, frailty, and a gnawing ache she found similar to menstrual cramps.
- Oceania > Australia (0.06)
- North America > United States > Iowa (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
Robust Stability of Neural Network-controlled Nonlinear Systems with Parametric Variability
Talukder, Soumyabrata, Kumar, Ratnesh
Stability certification and identifying a safe and stabilizing initial set are two important concerns in ensuring operational safety, stability, and robustness of dynamical systems. With the advent of machine-learning tools, these issues need to be addressed for the systems with machine-learned components in the feedback loop. To develop a general theory for stability and stabilizability of a neural network (NN)-controlled nonlinear system subject to bounded parametric variation, a Lyapunov-based stability certificate is proposed and is further used to devise a maximal Lipschitz bound for the NN controller, and also a corresponding maximal region-of-attraction (RoA) inside a given safe operating domain. To compute such a robustly stabilizing NN controller that also maximizes the system's long-run utility, a stability-guaranteed training (SGT) algorithm is proposed. The effectiveness of the proposed framework is validated through an illustrative example.
- North America > United States > Iowa > Story County > Ames (0.04)
- Asia > Middle East > Jordan (0.04)