AITopics | Irvin, Jeremy Andrew

Collaborating Authors

Irvin, Jeremy Andrew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Irvin, Jeremy Andrew, Liu, Emily Ruoyu, Chen, Joyce Chuyi, Dormoy, Ines, Kim, Jinyoung, Khanna, Samar, Zheng, Zhuo, Ermon, Stefano

arXiv.org Artificial IntelligenceOct-8-2024

Large vision and language assistants have enabled new capabilities for interpreting natural images. These approaches have recently been adapted to earth observation data, but they are only able to handle single image inputs, limiting their use for many real-world tasks. In this work, we develop a new vision and language assistant called TEOChat that can engage in conversations about temporal sequences of earth observation data. To train TEOChat, we curate an instructionfollowing dataset composed of many single image and temporal tasks including building change and damage assessment, semantic change detection, and temporal scene classification. We show that TEOChat can perform a wide variety of spatial and temporal reasoning tasks, substantially outperforming previous vision and language assistants, and even achieving comparable or better performance than specialist models trained to perform these specific tasks. Furthermore, TEOChat achieves impressive zero-shot performance on a change detection and change question answering dataset, outperforms GPT-4o and Gemini 1.5 Pro on multiple temporal tasks, and exhibits stronger single image capabilities than a comparable single EO image instruction-following model. Many earth observation (EO) tasks require the ability to reason over time. For example, change detection is a widely studied task where the goal is to identify salient changes in a region using multiple EO images capturing the region at different times (Chughtai et al., 2021; Bai et al., 2023; Cheng et al., 2023). Previous methods to automatically detect change in EO imagery have been specialist models, constraining their use to a single task or small set of tasks that they were explicitly trained to perform (Bai et al., 2023; Cheng et al., 2023). Advancements in the modeling of multimodal data have enabled generalist vision-language models (VLMs) that can perform a variety of natural image interpretation tasks specified flexibly through natural language (Achiam et al., 2023; Team et al., 2023; Liu et al., 2023). However, no prior VLMs can model temporal EO data (left of Figure 1), notably including change detection tasks. We investigate the performance of Video-LLaVA (Lin et al., 2023), a strong natural image pre-trained VLM that can receive images and videos as input, and GeoChat (Kuckreja et al., 2023), a strong VLM fine-tuned on single EO image tasks (right of Figure 1). We find that Video-LLaVA generates inaccurate information, likely because it has primarily been trained on natural images and videos, whereas GeoChat can only input single images and cannot process information across time. TEOChat is the first VLM to model temporal earth observation (EO) data. We compare a temporal VLM (Video-LLaVA (Lin et al., 2023)) and an EO VLM (GeoChat (Kuckreja et al., 2023)) with TEOChat.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.06234

Country:

North America > United States (0.28)
Asia (0.28)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GEO-Bench: Toward Foundation Models for Earth Monitoring

Lacoste, Alexandre, Lehmann, Nils, Rodriguez, Pau, Sherwin, Evan David, Kerner, Hannah, Lütjens, Björn, Irvin, Jeremy Andrew, Dao, David, Alemohammad, Hamed, Drouin, Alexandre, Gunturkun, Mehmet, Huang, Gabriel, Vazquez, David, Newman, Dava, Bengio, Yoshua, Ermon, Stefano, Zhu, Xiao Xiang

arXiv.org Artificial IntelligenceDec-23-2023

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.03831

Country:

Europe (0.67)
North America > United States > Virginia (0.28)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback