AITopics | spatiotemporal

Collaborating Authors

spatiotemporal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0d5501edb21a59a43435efa67f200828-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 11:23:14 GMT

anc, ensemble, model weight, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

Fine-grained Spatiotemporal Grounding on Egocentric Videos

Liang, Shuo, Zhong, Yiwu, Hu, Zi-Yuan, Tao, Yeyao, Wang, Liwei

arXiv.org Artificial IntelligenceDec-10-2025

Spatiotemporal video grounding aims to localize target entities in videos based on textual queries. While existing research has made significant progress in exocentric videos, the egocentric setting remains relatively underexplored, despite its growing importance in applications such as augmented reality and robotics. In this work, we conduct a systematic analysis of the discrepancies between egocentric and exocentric videos, revealing key challenges such as shorter object durations, sparser trajectories, smaller object sizes, and larger positional shifts. To address these challenges, we introduce EgoMask, the first pixel-level benchmark for fine-grained spatiotemporal grounding in egocentric videos. It is constructed by our proposed automatic annotation pipeline, which annotates referring expressions and object masks across short-, medium-, and long-term videos. Additionally, we create EgoMask-Train, a large-scale training dataset to facilitate model development. Experiments demonstrate that the state-of-the-art spatiotemporal grounding models perform poorly on our benchmark EgoMask, but fine-tuning on EgoMask-Train yields significant improvements, while preserving performance on exocentric datasets. Our work thus provides essential resources and insights for advancing egocentric video understanding. Our code is available at https://github.com/LaVi-Lab/EgoMask .

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.00518

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

Valencia, Kevin, Balasooriya, Thilina, Luo, Xihaier, Yoo, Shinjae, Park, David Keetae

arXiv.org Artificial IntelligenceNov-5-2025

Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and iteratively fuses cross-modal context. A multimodal crosstalk block architecture paired with iterative cross-modal refinement aligns signals prior to the decoder, enabling unified reconstruction, interpolation, forecasting, and cross-modal prediction without gridding or surrogate preprocessing. Extensive evaluations show that OmniField consistently outperforms eight strong multimodal spatiotemporal baselines. Under heavy simulated sensor noise, performance remains close to clean-input levels, highlighting robustness to corrupted measurements.

artificial intelligence, machine learning, modality, (19 more...)

arXiv.org Artificial Intelligence

2511.02205

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.94)
Energy (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

0d5501edb21a59a43435efa67f200828-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 01:01:19 GMT

artificial intelligence, machine learning, model weight, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via

Liang, Yuanyuan, Zhang, Tianhao, Xie, Tingyu

arXiv.org Artificial IntelligenceJan-13-2025

Handling anomalies is a critical preprocessing step in multivariate time series prediction. However, existing approaches that separate anomaly preprocessing from model training for multivariate time series prediction encounter significant limitations. Specifically, these methods fail to utilize auxiliary information crucial for identifying latent anomalies associated with spatiotemporal factors during the preprocessing stage. Instead, they rely solely on data distribution for anomaly detection, which can result in the incorrect processing of numerous samples that could otherwise contribute positively to model training. To address this, we propose STTS-EAD, an end-to-end method that seamlessly integrates anomaly detection into the training process of multivariate time series forecasting and aims to improve Spatio-Temporal learning based Time Series prediction via Embedded Anomaly Detection. Our proposed STTS-EAD leverages spatio-temporal information for forecasting and anomaly detection, with the two parts alternately executed and optimized for each other. To the best of our knowledge, STTS-EAD is the first to integrate anomaly detection and forecasting tasks in the training phase for improving the accuracy of multivariate time series forecasting. Extensive experiments on a public stock dataset and two real-world sales datasets from a renowned coffee chain enterprise show that our proposed method can effectively process detected anomalies in the training stage to improve forecasting performance in the inference stage and significantly outperform baselines.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.07814

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Kriformer: A Novel Spatiotemporal Kriging Approach Based on Graph Transformers

Pan, Renbin, Xiao, Feng, Zhang, Hegui, Shen, Minyu

arXiv.org Machine LearningSep-23-2024

Accurately estimating data in sensor-less areas is crucial for understanding system dynamics, such as traffic state estimation and environmental monitoring. This study addresses challenges posed by sparse sensor deployment and unreliable data by framing the problem as a spatiotemporal kriging task and proposing a novel graph transformer model, Kriformer. This model estimates data at locations without sensors by mining spatial and temporal correlations, even with limited resources. Kriformer utilizes transformer architecture to enhance the model's perceptual range and solve edge information aggregation challenges, capturing spatiotemporal information effectively. A carefully constructed positional encoding module embeds the spatiotemporal features of nodes, while a sophisticated spatiotemporal attention mechanism enhances estimation accuracy. The multi-head spatial interaction attention module captures subtle spatial relationships between observed and unobserved locations. During training, a random masking strategy prompts the model to learn with partial information loss, allowing the spatiotemporal embedding and multi-head attention mechanisms to synergistically capture correlations among locations. Experimental results show that Kriformer excels in representation learning for unobserved locations, validated on two real-world traffic speed datasets, demonstrating its effectiveness in spatiotemporal kriging tasks.

correlation, information, node, (14 more...)

arXiv.org Machine Learning

2409.14906

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
South America > Chile (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Cluster-Segregate-Perturb (CSP): A Model-agnostic Explainability Pipeline for Spatiotemporal Land Surface Forecasting Models

Verma, Tushar, Saha, Sudipan

arXiv.org Artificial IntelligenceAug-12-2024

Satellite images have become increasingly valuable for modelling regional climate change effects. Earth surface forecasting represents one such task that integrates satellite images with meteorological data to capture the joint evolution of regional climate change effects. However, understanding the complex relationship between specific meteorological variables and land surface evolution poses a significant challenge. In light of this challenge, our paper introduces a pipeline that integrates principles from both perturbation-based explainability techniques like LIME and global marginal explainability techniques like PDP, besides addressing the constraints of using such techniques when applying them to high-dimensional spatiotemporal deep models. The proposed pipeline simplifies the undertaking of diverse investigative analyses, such as marginal sensitivity analysis, marginal correlation analysis, lag analysis, etc., on complex land surface forecasting models In this study we utilised Convolutional Long Short-Term Memory (ConvLSTM) as the surface forecasting model and did analyses on the Normalized Difference Vegetation Index (NDVI) of the surface forecasts, since meteorological variables like temperature, pressure, and precipitation significantly influence it. The study area encompasses various regions in Europe. Our analyses show that precipitation exhibits the highest sensitivity in the study area, followed by temperature and pressure. Pressure has little to no direct effect on NDVI. Additionally, interesting nonlinear correlations between meteorological variables and NDVI have been uncovered.

meteorological variable, precipitation, weather segment, (15 more...)

arXiv.org Artificial Intelligence

2408.05916

Country:

Europe (0.25)
Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Wang, Andong, Wu, Bo, Chen, Sunli, Chen, Zhenfang, Guan, Haotian, Lee, Wei-Ning, Li, Li Erran, Gan, Chuang

arXiv.org Artificial IntelligenceMay-16-2024

Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge. We propose a new benchmark (SOK-Bench), consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos. The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving. To create such a dataset, we propose an automatic and scalable generation method to generate question-answer pairs, knowledge graphs, and rationales by instructing the combinations of LLMs and MLLMs. Concretely, we first extract observable situated entities, relations, and processes from videos for situated knowledge and then extend to open-world knowledge beyond the visible content. The task generation is facilitated through multiple dialogues as iterations and subsequently corrected and refined by our designed self-promptings and demonstrations. With a corpus of both explicit situated facts and implicit commonsense, we generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance. We evaluated recent mainstream large vision-language models on the benchmark and found several insightful conclusions. For more information, please refer to our benchmark at www.bobbywu.com/SOKBench.

benchmark, knowledge, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2405.09713

Country:

North America > United States > California (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

FairSTG: Countering performance heterogeneity via collaborative sample-level optimization

Lin, Gengyu, Zhou, Zhengyang, Huang, Qihe, Yang, Kuo, Cheng, Shifen, Wang, Yang

arXiv.org Artificial IntelligenceMar-18-2024

Spatiotemporal learning plays a crucial role in mobile computing techniques to empower smart cites. While existing research has made great efforts to achieve accurate predictions on the overall dataset, they still neglect the significant performance heterogeneity across samples. In this work, we designate the performance heterogeneity as the reason for unfair spatiotemporal learning, which not only degrades the practical functions of models, but also brings serious potential risks to real-world urban applications. To fix this gap, we propose a model-independent Fairness-aware framework for SpatioTemporal Graph learning (FairSTG), which inherits the idea of exploiting advantages of well-learned samples to challenging ones with collaborative mix-up. Specifically, FairSTG consists of a spatiotemporal feature extractor for model initialization, a collaborative representation enhancement for knowledge transfer between well-learned samples and challenging ones, and fairness objectives for immediately suppressing sample-level performance heterogeneity. Experiments on four spatiotemporal datasets demonstrate that our FairSTG significantly improves the fairness quality while maintaining comparable forecasting accuracy. Case studies show FairSTG can counter both spatial and temporal performance heterogeneity by our sample-level retrieval and compensation, and our work can potentially alleviate the risks on spatiotemporal resource allocation for underrepresented urban regions.

fairstg, heterogeneity, representation, (14 more...)

arXiv.org Artificial Intelligence

2403.12391

Country:

North America > United States (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

ComS2T: A complementary spatiotemporal learning system for data-adaptive model evolution

Zhou, Zhengyang, Huang, Qihe, Wang, Binwu, Hou, Jianpeng, Yang, Kuo, Liang, Yuxuan, Wang, Yang

arXiv.org Artificial IntelligenceMar-4-2024

Spatiotemporal (ST) learning has become a crucial technique to enable smart cities and sustainable urban development. Current ST learning models capture the heterogeneity via various spatial convolution and temporal evolution blocks. However, rapid urbanization leads to fluctuating distributions in urban data and city structures over short periods, resulting in existing methods suffering generalization and data adaptation issues. Despite efforts, existing methods fail to deal with newly arrived observations and those methods with generalization capacity are limited in repeated training. Motivated by complementary learning in neuroscience, we introduce a prompt-based complementary spatiotemporal learning termed ComS2T, to empower the evolution of models for data adaptation. ComS2T partitions the neural architecture into a stable neocortex for consolidating historical memory and a dynamic hippocampus for new knowledge update. We first disentangle two disjoint structures into stable and dynamic weights, and then train spatial and temporal prompts by characterizing distribution of main observations to enable prompts adaptive to new data. This data-adaptive prompt mechanism, combined with a two-stage training process, facilitates fine-tuning of the neural architecture conditioned on prompts, thereby enabling efficient adaptation during testing. Extensive experiments validate the efficacy of ComS2T in adapting to various spatiotemporal out-of-distribution scenarios while maintaining efficient inference capabilities.

hippocampus structure, learning, wang, (16 more...)

arXiv.org Artificial Intelligence

2403.01738

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > Singapore (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.72)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback