Goto

Collaborating Authors

 weather data



A Comparative Study of Machine Learning Algorithms for Electricity Price Forecasting with LIME-Based Interpretability

Zhao, Xuanyi, Ding, Jiawen, Huang, Xueting, Zhang, Yibo

arXiv.org Artificial Intelligence

With the rapid development of electricity markets, price volatility has significantly increased, making accurate forecasting crucial for power system operations and market decisions. Traditional linear models cannot capture the complex nonlinear characteristics of electricity pricing, necessitating advanced machine learning approaches. This study compares eight machine learning models using Spanish electricity market data, integrating consumption, generation, and meteorological variables. The models evaluated include linear regression, ridge regression, decision tree, KNN, random forest, gradient boosting, SVR, and XGBoost. Results show that KNN achieves the best performance with R^2 of 0.865, MAE of 3.556, and RMSE of 5.240. To enhance interpretability, LIME analysis reveals that meteorological factors and supply-demand indicators significantly influence price fluctuations through nonlinear relationships. This work demonstrates the effectiveness of machine learning models in electricity price forecasting while improving decision transparency through interpretability analysis.


Out-of-Distribution Generalization in Climate-Aware Yield Prediction with Earth Observation Data

Chakravarty, Aditya

arXiv.org Artificial Intelligence

Climate change is increasingly disrupting agricultural systems, making accurate crop yield forecasting essential for food security. While deep learning models have shown promise in yield prediction using satellite and weather data, their ability to generalize across geographic regions and years - critical for real-world deployment - remains largely untested. We benchmark two state-of-the-art models, GNN-RNN and MMST-ViT, under realistic out-of-distribution (OOD) conditions using the large-scale CropNet dataset spanning 1,200+ U.S. counties from 2017-2022. Through leave-one-cluster-out cross-validation across seven USDA Farm Resource Regions and year-ahead prediction scenarios, we identify substantial variability in cross-region transferability. GNN-RNN demonstrates superior generalization with positive correlations under geographic shifts, while MMST-ViT performs well in-domain but degrades sharply under OOD conditions. Regions like Heartland and Northern Great Plains show stable transfer dynamics (RMSE less than 10 bu/acre for soybean), whereas Prairie Gateway exhibits persistent underperformance (RMSE greater than 20 bu/acre) across both models and crops, revealing structural dissimilarities likely driven by semi-arid climate, irrigation patterns, and incomplete spectral coverage. Beyond accuracy differences, GNN-RNN achieves 135x faster training than MMST-ViT (14 minutes vs. 31.5 hours), making it more viable for sustainable deployment. Our findings underscore that spatial-temporal alignment - not merely model complexity or data scale - is key to robust generalization, and highlight the need for transparent OOD evaluation protocols to ensure equitable and reliable climate-aware agricultural forecasting.



Road Surface Condition Detection with Machine Learning using New York State Department of Transportation Camera Images and Weather Forecast Data

Sutter, Carly, Sulia, Kara J., Bassill, Nick P., Wirz, Christopher D., Thorncroft, Christopher D., Rothenberger, Jay C., Przybylo, Vanessa, Cains, Mariana G., Radford, Jacob, Evans, David Aaron

arXiv.org Artificial Intelligence

The NYSDOT evaluates road conditions by driving on roads and observing live cameras, tasks which are labor-intensive but necessary for making critical operational decisions during winter weather events. However, machine learning models can provide additional support for the NYSDOT by automatically classifying current road conditions across the state. In this study, convolutional neural networks and random forests are trained on camera images and weather data to predict road surface conditions. Models are trained on a hand-labeled dataset of 22,000 camera images, each classified by human labelers into one of six road surface conditions: severe snow, snow, wet, dry, poor visibility, or obstructed. Model generalizability is prioritized to meet the operational needs of the NYSDOT decision makers, and the weather-related road surface condition model in this study achieves an accuracy of 81.5% on completely unseen cameras. Keywords Winter weather Co-design Artificial intelligence Risk communication Hand-labeled dataset Highlights Developed a model to classify road surface conditions using image and weather data Achieved accuracy of 81.5% on completely unseen cameras for weather-related classes Integrated co-design with end-users and interdisciplinary collaboration Designed methods that prioritize model generalizability for operational applicability


There are actually 9 types of precipitation

Popular Science

Amazon Prime Day is live. See the best deals HERE. Weather models still struggle to parse the millions of datapoints involved in precipitation prediction. Breakthroughs, discoveries, and DIY tips sent every weekday. Most of us generally think of precipitation in terms of three varieties: rain, snow, and sleet .


262 million birds forecast to take to the skies tonight

Popular Science

BirdCast can help you follow their fall migration. Breakthroughs, discoveries, and DIY tips sent every weekday. Open up the weather radar and you might see what looks like precipitation when it's not raining at all. Those bright green spots are often birds during their annual fall migration -and you can follow along. According to BirdCast, 262 million birds are predicted to hit the skies tonight alone.


Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction

Najjar, Hiba, Pathak, Deepak, Nuske, Marlon, Dengel, Andreas

arXiv.org Artificial Intelligence

Multimodal learning enables various machine learning tasks to benefit from diverse data sources, effectively mimicking the interplay of different factors in real-world applications, particularly in agriculture. While the heterogeneous nature of involved data modalities may necessitate the design of complex architectures, the model interpretability is often overlooked. In this study, we leverage the intrinsic explainability of Transformer-based models to explain multimodal learning networks, focusing on the task of crop yield prediction at the subfield level. The large datasets used cover various crops, regions, and years, and include four different input modalities: multispectral satellite and weather time series, terrain elevation maps and soil properties. Based on the self-attention mechanism, we estimate feature attributions using two methods, namely the Attention Rollout (AR) and Generic Attention (GA), and evaluate their performance against Shapley-based model-agnostic estimations, Shapley Value Sampling (SVS). Additionally, we propose the Weighted Modality Activation (WMA) method to assess modality attributions and compare it with SVS attributions. Our findings indicate that Transformer-based models outperform other architectures, specifically convolutional and recurrent networks, achieving R2 scores that are higher by 0.10 and 0.04 at the subfield and field levels, respectively. AR is shown to provide more robust and reliable temporal attributions, as confirmed through qualitative and quantitative evaluation, compared to GA and SVS values. Information about crop phenology stages was leveraged to interpret the explanation results in the light of established agronomic knowledge. Furthermore, modality attributions revealed varying patterns across the two methods compared.[...]


Satellite Connectivity Prediction for Fast-Moving Platforms

Yan, Chao, Mafakheri, Babak

arXiv.org Artificial Intelligence

Satellite connectivity is gaining increased attention as the demand for seamless internet access, especially in transportation and remote areas, continues to grow. For fast-moving objects such as aircraft, vehicles, or trains, satellite connectivity is critical due to their mobility and frequent presence in areas without terrestrial coverage. Maintaining reliable connectivity in these cases requires frequent switching between satellite beams, constellations, or orbits. To enhance user experience and address challenges like long switching times, Machine Learning (ML) algorithms can analyze historical connectivity data and predict network quality at specific locations. This allows for proactive measures, such as network switching before connectivity issues arise. In this paper, we analyze a real dataset of communication between a Geostationary Orbit (GEO) satellite and aircraft over multiple flights, using ML to predict signal quality. Our prediction model achieved an F1 score of 0.97 on the test data, demonstrating the accuracy of machine learning in predicting signal quality during flight. By enabling seamless broadband service, including roaming between different satellite constellations and providers, our model addresses the need for real-time predictions of signal quality. This approach can further be adapted to automate satellite and beam-switching mechanisms to improve overall communication efficiency. The model can also be retrained and applied to any moving object with satellite connectivity, using customized datasets, including connected vehicles and trains.


Data analysis using discrete cubical homology

Kapulkin, Chris, Kershaw, Nathan

arXiv.org Artificial Intelligence

It is a highly intuitive and powerful tool, based on a simple observation from topology that data has a shape and understanding this shape is key to analyzing the data. The main tool of topological data analysis is persistence homology, a way of quantifying n -dimensional "holes" obtained from the data in question. The essential premise behind persistence homology is known as topological inference, which is the assumption that the data is a finite sample from some large (typically infinite) topological space. One goal, therefore, is to reproduce characteristics or features of that space from the finite sample. The process typically involves several steps, namely: building a filtered graph (often called the Rips graph) from a finite metric space, taking the associated filtration of flag complexes, and finally computing homology groups of these complexes. It is worth observing that in this pipeline, (filtered) graphs are merely an intermediate step in the construction, and not an object of independent interest. In contrast, in the present paper, we work with data that is most naturally presented in the form of a graph, without an assumption that it was sampled from a topological space. For example, consider a collection of time series of stock prices of all companies traded on a given exchange.