imagery
Overall Counting Anomaly Detection and Interpretation
Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data for Earth observation but pose challenges for existing multimodal foundation models due to two key bottlenecks: (1) limited availability of UHR training data, and (2) token explosion caused by the large image size. To address data scarcity, we introduce SuperRS-VQA (avg.
bd96a50dfd2314e48787581840a07a1a-Supplemental-Datasets_and_Benchmarks_Track.pdf
We use prompts to LLMs to act as language tools for two types of tasks in our work. The first being to798 read through and retrieve the relevant information from news articles to caption our image sequences,799 figures 6 and 7 The second being utilizing our captions to generate event specific question-answer800 pairs, figures 8 and 9.801 We conducted human validation on 144 events sampled across 15 disaster types to assess caption803 quality. Human evaluators were asked to classify each event as: (1) clear alignment between images,804 captions, and sources, (2) mismatch, or (3) inconclusive where imagery was insufficient to verify805 caption details. Overall results showed 65.3% clear alignment between images, captions, and sources,806 18.8% had mismatches, and 16.0% were inconclusive where imagery was insufficient to verify807 caption details. Excluding inconclusive cases, 77.7% of determinable events showed alignment,808 demonstrating reasonable caption quality for LLM-generated annotations.809
Mars-Bench: ABenchmark for Evaluating Foundation Models for Mars Science Tasks
Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of downstream tasks. While such models have gained significant attention in fields like Earth Observation, their application to Mars science remains limited. A key enabler of progress in other domains has been the availability of standardized benchmarks that support systematic evaluation. In contrast, Mars science lacks such benchmarks and standardized evaluation frameworks, which have limited progress toward developing foundation models for Martian tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery.
InstructSAM: ATraining-Free Framework for Instruction-Oriented Remote Sensing Object Recognition
Language-guided object recognition in remote sensing imagery is crucial for largescale mapping and automated data annotation. However, existing open-vocabulary and visual grounding methods rely on explicit category cues, limiting their ability to handle complex or implicit queries that require advanced reasoning. To address this issue, we introduce a new suite of tasks, including Instruction-Oriented Object Counting, Detection, and Segmentation (InstructCDS), covering open-vocabulary, open-ended, and open-subclass scenarios.
SentinelKilnDB: ALarge-Scale Dataset and Benchmark for OBBBrick Kiln Detection in South Asia Using Satellite Imagery Supplementary Information
The questions are presented in blue, with our corresponding responses shown in black. For what purpose was the dataset created? Was there a specific task in mind? This dataset was created for academic and research purposes to advance scientific understanding and support policy development on air quality and sustainability issues. The findings highlight important opportunities to improve regulatory compliance and encourage the adoption of cleaner technologies within the brick kiln sector, which is a significant contributor to regional air pollution. Beyond its environmental relevance, this dataset is especially valuable for the fields of object detection and computer vision. It provides a large-scale, hand-validated collection of brick kiln locations annotated with oriented bounding boxes (OBBs) on freely available Sentinel-2 satellite imagery.
SmokeViz: ALarge-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation
The global rise in wildfire frequency and intensity over the past decade underscores the need for improved fire monitoring techniques. To advance deep learning research on wildfire detection and its associated human health impacts, we introduce SmokeViz, a large-scale machine learning dataset of smoke plumes in satellite imagery. The dataset is derived from expert annotations created by smoke analysts at the National Oceanic and Atmospheric Administration, which provide coarse temporal and spatial approximations of smoke presence. To enhance annotation precision, we propose pseudo-label dimension reduction (PLDR), a generalizable method that applies pseudo-labeling to refine datasets with mismatching temporal and/or spatial resolutions. Unlike typical pseudo-labeling applications that aim to increase the number of labeled samples, PLDR maintains the original labels but increases the dataset quality by solving for intermediary pseudo-labels (IPLs) that align each annotation to the most representative input data. For SmokeViz, a parent model produces IPLs to identify the single satellite image within each annotations time window that best corresponds with the smoke plume. This refinement process produces a succinct and relevant deep learning dataset consisting of over 160,000 manual annotations. The SmokeViz dataset is expected to be a valuable resource to develop further wildfire-related machine learning models and is publicly available at https://noaa-gsl-experimental-pds.s3.amazonaws.com/index.
'Looked so real': How AI is being weaponised against India's Muslim women
'Looked so real': How AI is being weaponised against India's Muslim women The freelance model from India-administered Kashmir was scrolling on her phone last year when a friend sent her a clip circulating on Instagram. But it was entirely fabricated. "It was proper stalking," Ayoub, 24, said. "They had followed my life from my first semester to the last at the university." The video stitched together photographs from Ayoub's time as a student at New Delhi's Jamia Millia Islamia University - images drawn from everyday moments of campus life, including group projects, farewell gatherings and selfies with classmates.
Supplementary Information Scale and Benchmark for Irrigation Mapping from Satellite Imagery and Structured Environmental Features
To enhance surface property analysis for irrigation mapping, we compute a suite of spectral indices capturing vegetation health, water presence, and soil conditions12. Common vegetation indices such as NDVI, GNDVI, and CIgreen quantify canopy vigor and chlorophyll content, while EVI, SAVI, and MSAVI account for atmospheric and soil background effects [44, 68, 28].
IRRISIGHT: ALarge-Scale Multimodal Dataset and Scalable Pipeline to Address Irrigation and Water Management in Agriculture
The lack of fine-grained, large-scale datasets on water availability presents a critical barrier to applying machine learning (ML) for agricultural water management. Since there are multiple natural and anthropogenic factors that influence water availability, incorporating diverse multimodal features can significantly improve modeling performance. However, integrating such heterogeneous data is challenging due to spatial misalignments, inconsistent formats, semantic label ambiguities, and class imbalances. To address these challenges, we introduce IRRISIGHT, a large-scale, multimodal dataset spanning 20 U.S. states. It consists of 1.4 million pixel-aligned 224 224 patches that fuse satellite imagery with rich environmental attributes. We develop a robust geospatial fusion pipeline that aligns raster, vector, and point-based data on a unified 10m grid, and employ domain-informed structured prompts to convert tabular attributes into natural language. With irrigation type classification as a representative problem, the dataset is AI-ready, offering a spatially disjoint train/test split and extensive benchmarking with both vision and vision-language models. Our results demonstrate that multimodal representations substantially improve model performance, establishing a foundation for future research on water availability.