Goto

Collaborating Authors

 Pacific Ocean


Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting

arXiv.org Machine Learning

Sequence modeling faces challenges in capturing long-range dependencies across diverse tasks. Recent linear and transformer-based forecasters have shown superior performance in time series forecasting. However, they are constrained by their inherent inability to effectively address long-range dependencies in time series data, primarily due to using fixed-size inputs for prediction. Furthermore, they typically sacrifice essential temporal correlation among consecutive training samples by shuffling them into mini-batches. To overcome these limitations, we introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure. Spectral Attention preserves long-period trends through a low-pass filter and facilitates gradient to flow between samples. Spectral Attention can be seamlessly integrated into most sequence models, allowing models with fixed-sized look-back windows to capture long-range dependencies over thousands of steps. Through extensive experiments on 11 real-world time series datasets using 7 recent forecasting models, we consistently demonstrate the efficacy of our Spectral Attention mechanism, achieving state-of-the-art results.


Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling

arXiv.org Artificial Intelligence

Predicting future international events from textual information, such as news articles, has tremendous potential for applications in global policy, strategic decision-making, and geopolitics. However, existing datasets available for this task are often limited in quality, hindering the progress of related research. In this paper, we introduce WORLDREP (WORLD Relationship and Event Prediction), a novel dataset designed to address these limitations by leveraging the advanced reasoning capabilities of large-language models (LLMs). Our dataset features high-quality scoring labels generated through advanced prompt modeling and rigorously validated by domain experts in political science. We showcase the quality and utility of WORLDREP for real-world event prediction tasks, demonstrating its effectiveness through extensive experiments and analysis. Furthermore, we publicly release our dataset along with the full automation source code for data collection, labeling, and benchmarking, aiming to support and advance research in text-based event prediction.


Regional Ocean Forecasting with Hierarchical Graph Neural Networks

arXiv.org Artificial Intelligence

Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range ocean forecasting. SeaCast employs a graph-based framework to effectively handle the complex geometry of ocean grids and integrates external forcing data tailored to the regional ocean context. Our approach is validated through experiments at a high spatial resolution using the operational numerical model of the Mediterranean Sea provided by the Copernicus Marine Service, along with both numerical and data-driven atmospheric forcings.


Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective

arXiv.org Artificial Intelligence

Time series forecasting has played a pivotal role across various industries, including finance, transportation, energy, healthcare, and climate. Due to the abundant seasonal information they contain, timestamps possess the potential to offer robust global guidance for forecasting techniques. However, existing works primarily focus on local observations, with timestamps being treated merely as an optional supplement that remains underutilized. When data gathered from the real world is polluted, the absence of global information will damage the robust prediction capability of these algorithms. To address these problems, we propose a novel framework named GLAFF. Within this framework, the timestamps are modeled individually to capture the global dependencies. Working as a plugin, GLAFF adaptively adjusts the combined weights for global and local information, enabling seamless collaboration with any time series forecasting backbone. Extensive experiments conducted on nine real-world datasets demonstrate that GLAFF significantly enhances the average performance of widely used mainstream forecasting models by 12.5%, surpassing the previous state-of-the-art method by 5.5%.


Advancing Marine Heatwave Forecasts: An Integrated Deep Learning Approach

arXiv.org Artificial Intelligence

Marine heatwaves (MHWs), an extreme climate phenomenon, pose significant challenges to marine ecosystems and industries, with their frequency and intensity increasing due to climate change. This study introduces an integrated deep learning approach to forecast short-to-long-term MHWs on a global scale. The approach combines graph representation for modeling spatial properties in climate data, imbalanced regression to handle skewed data distributions, and temporal diffusion to enhance forecast accuracy across various lead times. To the best of our knowledge, this is the first study that synthesizes three spatiotemporal anomaly methodologies to predict MHWs. Additionally, we introduce a method for constructing graphs that avoids isolated nodes and provide a new publicly available sea surface temperature anomaly graph dataset. We examine the trade-offs in the selection of loss functions and evaluation metrics for MHWs. We analyze spatial patterns in global MHW predictability by focusing on historical hotspots, and our approach demonstrates better performance compared to traditional numerical models in regions such as the middle south Pacific, equatorial Atlantic near Africa, south Atlantic, and high-latitude Indian Ocean. We highlight the potential of temporal diffusion to replace the conventional sliding window approach for long-term forecasts, achieving improved prediction up to six months in advance. These insights not only establish benchmarks for machine learning applications in MHW forecasting but also enhance understanding of general climate forecasting methodologies.


Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach

arXiv.org Artificial Intelligence

Previous work has attempted to boost Large Language Model (LLM) performance on planning and scheduling tasks through a variety of prompt engineering techniques. While these methods can work within the distributions tested, they are neither robust nor predictable. This limitation can be addressed through compound LLM architectures where LLMs work in conjunction with other components to ensure reliability. In this paper, we present a technical evaluation of a compound LLM architecture--the LLM-Modulo framework. In this framework, an LLM is paired with a complete set of sound verifiers that validate its output, re-prompting it if it fails. This approach ensures that the system can never output any fallacious output, and therefore that every output generated is guaranteed correct--something previous techniques have not been able to claim. Our results, evaluated across four scheduling domains, demonstrate significant performance gains with the LLM-Modulo framework using various models. Additionally, we explore modifications to the base configuration of the framework and assess their impact on overall system performance.


Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

arXiv.org Artificial Intelligence

The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies. The sheer volume of data used in the design of LLMs has precluded us from applying the method traditionally used to measure generalisation: train-test set separation. To overcome this, we study what kind of generalisation strategies LLMs employ when performing reasoning tasks by investigating the pretraining data they rely on. For two models of different sizes (7B and 35B) and 2.5B of their pretraining tokens, we identify what documents influence the model outputs for three simple mathematical reasoning tasks and contrast this to the data that are influential for answering factual questions. We find that, while the models rely on mostly distinct sets of data for each factual question, a document often has a similar influence across different reasoning questions within the same task, indicating the presence of procedural knowledge. We further find that the answers to factual questions often show up in the most influential data. However, for reasoning questions the answers usually do not show up as highly influential, nor do the answers to the intermediate reasoning steps. When we characterise the top ranked documents for the reasoning questions qualitatively, we confirm that the influential documents often contain procedural knowledge, like demonstrating how to obtain a solution using formulae or code. Our findings indicate that the approach to reasoning the models use is unlike retrieval, and more like a generalisable strategy that synthesises procedural knowledge from documents doing a similar form of reasoning.


Advancing Large Language Models for Spatiotemporal and Semantic Association Mining of Similar Environmental Events

arXiv.org Artificial Intelligence

Retrieval and recommendation are two essential tasks in modern search tools. This paper introduces a novel retrieval-reranking framework leveraging Large Language Models (LLMs) to enhance the spatiotemporal and semantic associated mining and recommendation of relevant unusual climate and environmental events described in news articles and web posts. This framework uses advanced natural language processing techniques to address the limitations of traditional manual curation methods in terms of high labor cost and lack of scalability. Specifically, we explore an optimized solution to employ cutting-edge embedding models for semantically analyzing spatiotemporal events (news) and propose a Geo-Time Re-ranking (GT-R) strategy that integrates multi-faceted criteria including spatial proximity, temporal association, semantic similarity, and category-instructed similarity to rank and identify similar spatiotemporal events. We apply the proposed framework to a dataset of four thousand Local Environmental Observer (LEO) Network events, achieving top performance in recommending similar events among multiple cutting-edge dense retrieval models. The search and recommendation pipeline can be applied to a wide range of similar data search tasks dealing with geospatial and temporal data. We hope that by linking relevant events, we can better aid the general public to gain an enhanced understanding of climate change and its impact on different communities.


Underwater robot discovers a never-before-seen creature at the junction of three tectonic plates in the Pacific Ocean - as baffled viewers dub it the 'forbidden toilet scrubber'

Daily Mail - Science & tech

At first glance at this creature, you'd be forgiven for mistaking it for a sparkly pair of fake eyelashes. But the creature is very much real and was discovered at the junction of three tectonic plates in the Pacific Ocean. Researchers from the Schmidt Ocean Institute spotted the animal while using an underwater robot to scour the seabed. The animal is a polychaete - a class of marine worms, more widely known as bristle worms. 'To describe this polychaete, one simply must use jazz hands -- it is the only way to capture this deep-sea worm's dazzle!' the experts said in an Instagram post about the polychaete.


Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster

arXiv.org Artificial Intelligence

Throughout the scientific computing space, deep learning algorithms have shown excellent performance in a wide range of applications. As these deep neural networks (DNNs) continue to mature, the necessary compute required to train them has continued to grow. Today, modern DNNs require millions of FLOPs and days to weeks of training to generate a well-trained model. The training times required for DNNs are oftentimes a bottleneck in DNN research for a variety of deep learning applications, and as such, accelerating and scaling DNN training enables more robust and accelerated research. To that end, in this work, we explore utilizing the NRP Nautilus HyperCluster to automate and scale deep learning model training for three separate applications of DNNs, including overhead object detection, burned area segmentation, and deforestation detection. In total, 234 deep neural models are trained on Nautilus, for a total time of 4,040 hours. Deep convolutional neural networks (DCNNs) have been established as the state of the art in computer vision (CV) and have shown superior performance in visual tasks for many domains, including remote sensing. With billions of pixels being collected by overhead sources like satellites, remote sensing (RS) is becoming evermore a big-data problem domain, with endless amounts of data available to enable CV applications. Due in part to this data availability, the training and optimization of deep networks for RS applications has been explored to great lengths in recent years. In 2017, researchers investigated utilizing DCNNs for land-cover classification in overhead imagery along with techniques such as transfer learning and data augmentation[1]. This work was then extended into multi-network fusion research, where multiple DCNNs trained on overhead satellite imagery were fused using simple fusion techniques such as voting and arrogance [2] and then compared to more complex fusion algorithms such as the Choquet and Sugeno Fuzzy Integral [3], [4]. While these studies explored utilizing DCNNs to perform classification on overhead RS imagery, further exploration was required in broad area search, in which DCNNs are trained and used not on clean pre-processed datasets, but instead applied to large swaths of overhead imagery with the goal of finding all instances of a given object or terrain.