AITopics | Geophysical Analysis & Survey

Collaborating Authors

Geophysical Analysis & Survey

RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing

Wang, Fengxiang, Wang, Hongzhen, Wang, Yulin, Wang, Di, Chen, Mingshuo, Zhao, Haiyan, Sun, Yangang, Wang, Shuo, Lan, Long, Yang, Wenjing, Zhang, Jing

arXiv.org Artificial IntelligenceMar-13-2025

Recent advances in self-supervised learning for Vision Transformers (ViTs) have fueled breakthroughs in remote sensing (RS) foundation models. However, the quadratic complexity of self-attention poses a significant barrier to scalability, particularly for large models and high-resolution images. While the linear-complexity Mamba architecture offers a promising alternative, existing RS applications of Mamba remain limited to supervised tasks on small, domain-specific datasets. To address these challenges, we propose RoMA, a framework that enables scalable self-supervised pretraining of Mamba-based RS foundation models using large-scale, diverse, unlabeled data. RoMA enhances scalability for high-resolution images through a tailored auto-regressive learning strategy, incorporating two key innovations: 1) a rotation-aware pretraining mechanism combining adaptive cropping with angular embeddings to handle sparsely distributed objects with arbitrary orientations, and 2) multi-scale token prediction objectives that address the extreme variations in object scales inherent to RS imagery. Systematic empirical studies validate that Mamba adheres to RS data and parameter scaling laws, with performance scaling reliably as model and data size increase. Furthermore, experiments across scene classification, object detection, and semantic segmentation tasks demonstrate that RoMA-pretrained Mamba models consistently outperform ViT-based counterparts in both accuracy and computational efficiency. The source code and pretrained models will be released at https://github.com/MiliLab/RoMA.

classification, mamba, remote sensing, (14 more...)

arXiv.org Artificial Intelligence

2503.10392

Country:

North America > United States > Colorado (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.77)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Urban Region Representation Learning: A Flexible Approach

Sun, Fengze, Chang, Yanchuan, Tanin, Egemen, Karunasekera, Shanika, Qi, Jianzhong

arXiv.org Artificial IntelligenceMar-12-2025

The increasing availability of urban data offers new opportunities for learning region representations, which can be used as input to machine learning models for downstream tasks such as check-in or crime prediction. While existing solutions have produced promising results, an issue is their fixed formation of regions and fixed input region features, which may not suit the needs of different downstream tasks. To address this limitation, we propose a model named FlexiReg for urban region representation learning that is flexible with both the formation of urban regions and the input region features. FlexiReg is based on a spatial grid partitioning over the spatial area of interest. It learns representations for the grid cells, leveraging publicly accessible data, including POI, land use, satellite imagery, and street view imagery. We propose adaptive aggregation to fuse the cell representations and prompt learning techniques to tailor the representations towards different tasks, addressing the needs of varying formations of urban regions and downstream tasks. Extensive experiments on five real-world datasets demonstrate that FlexiReg outperforms state-of-the-art models by up to 202% in term of the accuracy of four diverse downstream tasks using the produced urban region representations.

downstream task, grid cell, representation, (15 more...)

arXiv.org Artificial Intelligence

2503.09128

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Singapore (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology (0.46)
Transportation > Ground > Road (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

Add feedback

ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification

Li, Mingshi, Grujicic, Dusan, Somers, Ben, Heremans, Stien, De Saeger, Steven, Blaschko, Matthew B.

arXiv.org Artificial IntelligenceMar-11-2025

Remote sensing imagery from systems such as Sentinel provides full coverage of the Earth's surface at around 10-meter resolution. The remote sensing community has transitioned to extensive use of deep learning models due to their high performance on benchmarks such as the UCMerced and ISPRS Vaihingen datasets. Convolutional models such as UNet and ResNet variations are commonly employed for remote sensing but typically only accept three channels, as they were developed for RGB imagery, while satellite systems provide more than ten. Recently, several transformer architectures have been proposed for remote sensing, but they have not been extensively benchmarked and are typically used on small datasets such as Salinas Valley. Meanwhile, it is becoming feasible to obtain dense spatial land-use labels for entire first-level administrative divisions of some countries. Scaling law observations suggest that substantially larger multi-spectral transformer models could provide a significant leap in remote sensing performance in these settings. In this work, we propose ChromaFormer, a family of multi-spectral transformer models, which we evaluate across orders of magnitude differences in model parameters to assess their performance and scaling effectiveness on a densely labeled imagery dataset of Flanders, Belgium, covering more than 13,500 km^2 and containing 15 classes. We propose a novel multi-spectral attention strategy and demonstrate its effectiveness through ablations. Furthermore, we show that models many orders of magnitude larger than conventional architectures, such as UNet, lead to substantial accuracy improvements: a UNet++ model with 23M parameters achieves less than 65% accuracy, while a multi-spectral transformer with 655M parameters achieves over 95% accuracy on the Biological Valuation Map of Flanders.

classification, dataset, remote sensing, (14 more...)

arXiv.org Artificial Intelligence

2503.08534

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report (1.00)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing

Zi, Xing, Jin, Kairui, Tao, Xian, Li, Jun, Braytee, Ali, Shah, Rajiv Ratn, Prasad, Mukesh

arXiv.org Artificial IntelligenceMar-10-2025

Pixel-level segmentation is essential in remote sensing, where foundational vision models like CLIP and Segment Anything Model(SAM) have demonstrated significant capabilities in zero-shot segmentation tasks. Despite their advances, challenges specific to remote sensing remain substantial. Firstly, The SAM without clear prompt constraints, often generates redundant masks, and making post-processing more complex. Secondly, the CLIP model, mainly designed for global feature alignment in foundational models, often overlooks local objects crucial to remote sensing. This oversight leads to inaccurate recognition or misplaced focus in multi-target remote sensing imagery. Thirdly, both models have not been pre-trained on multi-scale aerial views, increasing the likelihood of detection failures. To tackle these challenges, we introduce the innovative VTPSeg pipeline, utilizing the strengths of Grounding DINO, CLIP, and SAM for enhanced open-vocabulary image segmentation. The Grounding DINO+(GD+) module generates initial candidate bounding boxes, while the CLIP Filter++(CLIP++) module uses a combination of visual and textual prompts to refine and filter out irrelevant object bounding boxes, ensuring that only pertinent objects are considered. Subsequently, these refined bounding boxes serve as specific prompts for the FastSAM model, which executes precise segmentation. Our VTPSeg is validated by experimental and ablation study results on five popular remote sensing image segmentation datasets.

dataset, segmentation, vtpseg, (12 more...)

arXiv.org Artificial Intelligence

2503.07911

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(2 more...)

Genre: Research Report (0.51)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Luo, Junwei, Zhang, Yingying, Yang, Xue, Wu, Kang, Zhu, Qi, Liang, Lei, Chen, Jingdong, Li, Yansheng

arXiv.org Artificial IntelligenceMar-10-2025

Efficient vision-language understanding of large Remote Sensing Images (RSIs) is meaningful but challenging. Current Large Vision-Language Models (LVLMs) typically employ limited pre-defined grids to process images, leading to information loss when handling gigapixel RSIs. Conversely, using unlimited grids significantly increases computational costs. To preserve image details while reducing computational complexity, we propose a text-guided token pruning method with Dynamic Image Pyramid (DIP) integration. Our method introduces: (i) a Region Focus Module (RFM) that leverages text-aware region localization capability to identify critical vision tokens, and (ii) a coarse-to-fine image tile selection and vision token pruning strategy based on DIP, which is guided by RFM outputs and avoids directly processing the entire large imagery. Additionally, existing benchmarks for evaluating LVLMs' perception ability on large RSI suffer from limited question diversity and constrained image sizes. We construct a new benchmark named LRS-VQA, which contains 7,333 QA pairs across 8 categories, with image length up to 27,328 pixels. Our method outperforms existing high-resolution strategies on four datasets using the same data. Moreover, compared to existing token reduction methods, our approach demonstrates higher efficiency under high-resolution settings. Dataset and code are in https://github.com/VisionXLab/LRS-VQA.

arxiv preprint arxiv, wang, zhang, (14 more...)

arXiv.org Artificial Intelligence

2503.07588

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.72)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model

Immanuel, Steve Andreas, Cho, Woojin, Heo, Junhyuk, Kwon, Darongsae

arXiv.org Artificial IntelligenceMar-4-2025

Limited data is a common problem in remote sensing due to the high cost of obtaining annotated samples. In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples. However, this often necessitates specialized model architectures or complex training strategies. Instead, we propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects within a given scene, conditioned by the limited examples of the novel classes. By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments, effectively increasing the number of samples for the novel classes and mitigating overfitting. The generated samples are then assessed using a cosine similarity metric to ensure semantic consistency with the novel classes. Additionally, we employ Segment Anything Model (SAM) to segment the generated samples and obtain precise annotations. By using high-quality synthetic data, we can directly fine-tune off-the-shelf segmentation models. Experimental results demonstrate that our method significantly enhances segmentation performance in low-data regimes, highlighting its potential for real-world remote sensing applications.

novel class, remote sensing, segmentation, (10 more...)

arXiv.org Artificial Intelligence

2503.03785

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

WalnutData: A UAV Remote Sensing Dataset of Green Walnuts and Model Evaluation

Wu, Mingjie, Yang, Chenggui, Wang, Huihua, Xue, Chen, Wang, Yibo, Wang, Haoyu, Wang, Yansong, Peng, Can, Han, Yuqi, Li, Ruoyu, Yun, Lijun, Chen, Zaiqing, Xia, Yuelong

arXiv.org Artificial IntelligenceMar-4-2025

The UAV technology is gradually maturing and can provide extremely powerful support for smart agriculture and precise monitoring. Currently, there is no dataset related to green walnuts in the field of agricultural computer vision. Thus, in order to promote the algorithm design in the field of agricultural computer vision, we used UAV to collect remote-sensing data from 8 walnut sample plots. Considering that green walnuts are subject to various lighting conditions and occlusion, we constructed a large-scale dataset with a higher-granularity of target features - WalnutData. This dataset contains a total of 30,240 images and 706,208 instances, and there are 4 target categories: being illuminated by frontal light and unoccluded (A1), being backlit and unoccluded (A2), being illuminated by frontal light and occluded (B1), and being backlit and occluded (B2). Subsequently, we evaluated many mainstream algorithms on WalnutData and used these evaluation results as the baseline standard. The dataset and all evaluation results can be obtained at https://github.com/1wuming/WalnutData.

dataset, detection, walnutdata, (14 more...)

arXiv.org Artificial Intelligence

2502.20092

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Yunnan Province (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Industry:

Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.88)
Information Technology > Artificial Intelligence > Vision (0.76)

Add feedback

Lossy Neural Compression for Geospatial Analytics: A Review

Gomes, Carlos, Wittmann, Isabelle, Robert, Damien, Jakubik, Johannes, Reichelt, Tim, Martone, Michele, Maurogiovanni, Stefano, Vinge, Rikard, Hurst, Jonas, Scheurer, Erik, Sedona, Rocco, Brunschwiler, Thomas, Kesselheim, Stefan, Batic, Matej, Stier, Philip, Wegner, Jan Dirk, Cavallaro, Gabriele, Pebesma, Edzer, Marszalek, Michael, Belenguer-Plomer, Miguel A, Adriko, Kennedy, Fraccaro, Paolo, Kienzler, Romeo, Briq, Rania, Benassou, Sabrina, Lazzarini, Michele, Albrecht, Conrad M

arXiv.org Artificial IntelligenceMar-3-2025

Over the past decades, there has been an explosion in the amount of available Earth Observation (EO) data. The unprecedented coverage of the Earth's surface and atmosphere by satellite imagery has resulted in large volumes of data that must be transmitted to ground stations, stored in data centers, and distributed to end users. Modern Earth System Models (ESMs) face similar challenges, operating at high spatial and temporal resolutions, producing petabytes of data per simulated day. Data compression has gained relevance over the past decade, with neural compression (NC) emerging from deep learning and information theory, making EO data and ESM outputs ideal candidates due to their abundance of unlabeled data. In this review, we outline recent developments in NC applied to geospatial data. We introduce the fundamental concepts of NC including seminal works in its traditional applications to image and video compression domains with focus on lossy compression. We discuss the unique characteristics of EO and ESM data, contrasting them with "natural images", and explain the additional challenges and opportunities they present. Moreover, we review current applications of NC across various EO modalities and explore the limited efforts in ESM compression to date. The advent of self-supervised learning (SSL) and foundation models (FM) has advanced methods to efficiently distill representations from vast unlabeled data. We connect these developments to NC for EO, highlighting the similarities between the two fields and elaborate on the potential of transferring compressed feature representations for machine--to--machine communication. Based on insights drawn from this review, we devise future directions relevant to applications in EO and ESM.

artificial intelligence, information management, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.01505

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Iceland (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)
Research Report > New Finding (0.45)

Industry:

Food & Agriculture > Agriculture (1.00)
Information Technology > Services (0.66)
Government > Regional Government > North America Government > United States Government (0.46)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.39)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

Xiang, Xiang, Xu, Zhuo, Deng, Yao, Zhou, Qinhao, Liang, Yifan, Chen, Ke, Zheng, Qingfang, Wang, Yaowei, Chen, Xilin, Gao, Wen

arXiv.org Artificial IntelligenceFeb-27-2025

In open-world remote sensing, deployed models must continuously adapt to a steady influx of new data, which often exhibits various shifts compared to what the model encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update themselves. These challenges give rise to a variety of open-world tasks. However, existing open-world remote sensing studies typically train and test within a single dataset to simulate open-world conditions. Currently, there is a lack of large-scale benchmarks capable of evaluating multiple open-world tasks. In this paper, we introduce OpenEarthSensing, a large-scale fine-grained benchmark for open-world remote sensing. OpenEarthSensing includes 189 scene and objects categories, covering the vast majority of potential semantic shifts that may occur in the real world. Additionally, OpenEarthSensing encompasses five data domains with significant covariate shifts, including two RGB satellite domians, one RGB aerial domian, one MS RGB domian, and one infrared domian. The various domains provide a more comprehensive testbed for evaluating the generalization performance of open-world models. We conduct the baseline evaluation of current mainstream open-world tasks and methods on OpenEarthSensing, demonstrating that it serves as a challenging benchmark for open-world remote sensing.

dataset, detection, learning, (14 more...)

arXiv.org Artificial Intelligence

2502.20668

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Multispectral to Hyperspectral using Pretrained Foundational model

Gonzalez, Ruben, Albrecht, Conrad M, Braham, Nassim Ait Ali, Lambhate, Devyani, Almeida, Joao Lucas de Sousa, Fraccaro, Paolo, Blumenstiel, Benedikt, Brunschwiler, Thomas, Bangalore, Ranjini

arXiv.org Artificial IntelligenceFeb-26-2025

Multispectral to Hyperspectral using Pretrained Foundational model Ruben Gonzalez* 1, Conrad M Albrecht 1, Nassim Ait Ali Braham 1, Devyani Lambhate* 2, Joao Lucas de Sousa Almeida 2, Paolo Fraccaro 2, Benedikt Blumenstiel 2, Thomas Brunschwiler 2, and Ranjini Bangalore 2 1 Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany 2 IBM Research Labs, India, U.K., Zurich, Brazil February 28, 2025 Abstract Hyperspectral imaging provides detailed spectral information, offering significant potential for monitoring greenhouse gases like CH 4 and NO 2. However, its application is constrained by limited spatial coverage and infrequent revisit times. In contrast, multispectral imaging delivers broader spatial and temporal coverage but lacks the spectral granularity required for precise GHG detection. To address these challenges, this study proposes Spectral and Spatial-Spectral transformer models that reconstructs hyperspectral data from multispectral inputs. The models in this paper are pretrained on EnMAP and EMIT datasets and fine-tuned on spatio-temporally aligned (Sentinel-2, EnMAP) and (HLS-S30, EMIT) image pairs respectively. Our model has the potential to enhance atmospheric monitoring by combining the strengths of hyperspectral and multispectral imaging systems. 1 Introduction Satellite images are being used to create detailed maps of Earth's surface.

dataset, hyperspectral data, reconstruction, (15 more...)

arXiv.org Artificial Intelligence

2502.19451

Country:

South America > Brazil (0.24)
Europe > Switzerland > Zürich > Zürich (0.24)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.24)
(3 more...)

Genre: Research Report (0.40)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback