Energy
Analytical Formulation of Autonomous Vehicle Freeway Merging Control with State-Dependent Discharge Rates
The core of the freeway merging control problem lies in dynamic queue propagation and dissipation linked to merging vehicle behavior. Traditionally, queuing is modeled through demand-supply interactions with time varying demand and fixed capacity. However, field observations show flow rates decrease during congestion at freeway merges due to the impact of intersecting traffic, a factor overlooked in fundamental diagrams. This manuscript introduces an analytical approach to characterize and control the dynamic multi-stage merging of autonomous vehicles, prioritizing traffic efficiency and safety. For the first time, the effective discharge rate at the merging point, reduced by the multi-stage dynamic merging process, is analytically derived using a closed form formulation. Leveraging this expression, performance metrics such as queue length and traffic delay are derived as the first objective. Additionally, a crash risk function is established to quantitatively assess potential collisions during the merging process, serving as the second objective. Finally, the problem is formulated as a dynamic programming model to jointly minimize delay and crash risk, with the merging location and speed as decision variables. Given the terminal state, the ramp vehicle merging task is formulated as a recursive optimization problem, employing backward induction to find the minimum cost solution. Numerical experiments using the NGSIM dataset validate the derived effective discharge rate. The results indicate that the proposed model outperforms two benchmark algorithms, leading to a more efficient and safer merging process.
Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman
Nwafor, Obumneme Zimuzor, Hooti, Mohammed Abdul Majeed Al
As nations seek sustainable alternatives to fossil fuels, green hydrogen has emerged as a promising strategic pathway toward decarbonisation, particularly in solar-rich arid regions. However, identifying optimal locations for hydrogen production requires the integration of complex environmental, atmospheric, and infrastructural factors, often compounded by limited availability of direct hydrogen yield data. This study presents a novel Artificial Intelligence (AI) framework for computing green hydrogen yield and site suitability index using mean absolute SHAP (SHapley Additive exPlanations) values. This framework consists of a multi-stage pipeline of unsupervised multi-variable clustering, supervised machine learning classifier and SHAP algorithm. The pipeline trains on an integrated meteorological, topographic and temporal dataset and the results revealed distinct spatial patterns of suitability and relative influence of the variables. With model predictive accuracy of 98%, the result also showed that water proximity, elevation and seasonal variation are the most influential factors determining green hydrogen site suitability in Oman with mean absolute shap values of 2.470891, 2.376296 and 1.273216 respectively. Given limited or absence of ground-truth yield data in many countries that have green hydrogen prospects and ambitions, this study offers an objective and reproducible alternative to subjective expert weightings, thus allowing the data to speak for itself and potentially discover novel latent groupings without pre-imposed assumptions. This study offers industry stakeholders and policymakers a replicable and scalable tool for green hydrogen infrastructure planning and other decision making in data-scarce regions.
Leveraging RAG-LLMs for Urban Mobility Simulation and Analysis
Ding, Yue, McCarthy, Conor, O'Shea, Kevin, Liu, Mingming
With the rise of smart mobility and shared e-mobility services, numerous advanced technologies have been applied to this field. Cloud-based traffic simulation solutions have flourished, offering increasingly realistic representations of the evolving mobility landscape. LLMs have emerged as pioneering tools, providing robust support for various applications, including intelligent decision-making, user interaction, and real-time traffic analysis. As user demand for e-mobility continues to grow, delivering comprehensive end-to-end solutions has become crucial. In this paper, we present a cloud-based, LLM-powered shared e-mobility platform, integrated with a mobile application for personalized route recommendations. The optimization module is evaluated based on travel time and cost across different traffic scenarios. Additionally, the LLM-powered RAG framework is evaluated at the schema level for different users, using various evaluation methods. Schema-level RAG with XiYanSQL achieves an average execution accuracy of 0.81 on system operator queries and 0.98 on user queries.
Unified Sparse-Matrix Representations for Diverse Neural Architectures
Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix formulations match or exceed native model performance while converging in comparable or fewer epochs. By reducing architecture design to sparse pattern selection, our matrix perspective aligns with GPU parallelism and leverages mature algebraic optimization tools. This work establishes a mathematically rigorous substrate for diverse neural architectures and opens avenues for principled, hardware-aware network design.
A Physically Driven Long Short Term Memory Model for Estimating Snow Water Equivalent over the Continental United States
Saranathan, Arun M., Saeedimoghaddam, Mahmoud, Smith, Brandon, Raghunandan, Deepthi, Nearing, Grey, Pelissier, Craig
Snow is an essential input for various land surface models. Seasonal snow estimates are available as snow water equivalent (SWE) from process-based reanalysis products or locally from in situ measurements. While the reanalysis products are computationally expensive and available at only fixed spatial and temporal resolutions, the in situ measurements are highly localized and sparse. To address these issues and enable the analysis of the effect of a large suite of physical, morphological, and geological conditions on the presence and amount of snow, we build a Long Short-Term Memory (LSTM) network, which is able to estimate the SWE based on time series input of the various physical/meteorological factors as well static spatial/morphological factors. Specifically, this model breaks down the SWE estimation into two separate tasks: (i) a classification task that indicates the presence/absence of snow on a specific day and (ii) a regression task that indicates the height of the SWE on a specific day in the case of snow presence. The model is trained using physical/in situ SWE measurements from the SNOw TELemetry (SNOTEL) snow pillows in the western United States. We will show that trained LSTM models have a classification accuracy of $\geq 93\%$ for the presence of snow and a coefficient of correlation of $\sim 0.9$ concerning their SWE estimates. We will also demonstrate that the models can generalize both spatially and temporally to previously unseen data.
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
Yang, Shuai, Li, Hao, Chen, Yilun, Wang, Bin, Tian, Yang, Wang, Tai, Wang, Hanqing, Zhao, Feng, Liao, Yiyi, Pang, Jiangmiao
To operate effectively in the real world, robots must integrate multimodal reasoning with precise action generation. However, existing vision-language-action (VLA) models often sacrifice one for the other, narrow their abilities to task-specific manipulation data, and suffer catastrophic forgetting of pre-trained vision-language capabilities. To bridge this gap, we introduce InstructVLA, an end-to-end VLA model that preserves the flexible reasoning of large vision-language models (VLMs) while delivering leading manipulation performance. InstructVLA introduces a novel training paradigm, Vision-Language-Action Instruction Tuning (VLA-IT), which employs multimodal training with mixture-of-experts adaptation to jointly optimize textual reasoning and action generation on both standard VLM corpora and a curated 650K-sample VLA-IT dataset. On in-domain SimplerEnv tasks, InstructVLA achieves 30.5% improvement over SpatialVLA. To evaluate generalization, we introduce SimplerEnv-Instruct, an 80-task benchmark requiring closed-loop control and high-level instruction understanding, where it outperforms a fine-tuned OpenVLA by 92% and an action expert aided by GPT-4o by 29%. Additionally, InstructVLA surpasses baseline VLMs on multimodal tasks and exhibits inference-time scaling by leveraging textual reasoning to boost manipulation performance in both simulated and real-world settings. These results demonstrate InstructVLA's potential for bridging intuitive and steerable human-robot interaction with efficient policy learning.
IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird's-Eye View Perception
Li, Haichuan, Tian, Changda, Trahanias, Panos, Westerlund, Tomi
The deployment of autonomous robots in indoor environments necessitates precise and real-time perception of their surroundings to ensure safe and efficient navigation. Lidar sensors have emerged as a pivotal technology in this domain, offering high-resolution 3D point cloud data that is robust to varying lighting conditions and capable of capturing intricate spatial details. However, transforming this unstructured point cloud data into actionable representations for tasks such as object detection, segmentation, and navigation remains a formidable challenge, particularly given the complexity and clutter often found indoors. Recent advancements have sought to address these challenges. For instance, MakeWay [1] system introduces object-aware costmaps derived from lidar data to enhance proactive indoor navigation. Similarly, the L V -DOT framework [2] leverages a fusion of lidar and visual data to improve dynamic obstacle detection and tracking in indoor settings. These approaches underscore the potential of integrating machine learning techniques with lidar data to enhance indoor perception. Bird's-Eye View (BEV) representations naturally handles occlusions and provides a representation directly amenable to downstream robotic tasks like navigation and planning due to their ability to provide a top-down, spatially consistent view of the environment.
Eco-Friendly AI: Unleashing Data Power for Green Federated Learning
Sabella, Mattia, Vitali, Monica
The widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) comes with a significant environmental impact, particularly in terms of energy consumption and carbon emissions. This pressing issue highlights the need for innovative solutions to mitigate AI's ecological footprint. One of the key factors influencing the energy consumption of ML model training is the size of the training dataset. ML models are often trained on vast amounts of data continuously generated by sensors and devices distributed across multiple locations. To reduce data transmission costs and enhance privacy, Federated Learning (FL) enables model training without the need to move or share raw data. While FL offers these advantages, it also introduces challenges due to the heterogeneity of data sources (related to volume and quality), computational node capabilities, and environmental impact. This paper contributes to the advancement of Green AI by proposing a data-centric approach to Green Federated Learning. Specifically, we focus on reducing FL's environmental impact by minimizing the volume of training data. Our methodology involves the analysis of the characteristics of federated datasets, the selecting of an optimal subset of data based on quality metrics, and the choice of the federated nodes with the lowest environmental impact. We develop a comprehensive methodology that examines the influence of data-centric factors, such as data quality and volume, on FL training performance and carbon emissions. Building on these insights, we introduce an interactive recommendation system that optimizes FL configurations through data reduction, minimizing environmental impact during training. Applying this methodology to time series classification has demonstrated promising results in reducing the environmental impact of FL tasks.
Dispatch-Aware Deep Neural Network for Optimal Transmission Switching: Toward Real-Time and Feasibility Guaranteed Operation
Optimal transmission switching (OTS) improves optimal power flow (OPF) by selectively opening transmission lines, but its mixed-integer formulation increases computational complexity, especially on large grids. To deal with this, we propose a dispatch-aware deep neural network (DA-DNN) that accelerates DC-OTS without relying on pre-solved labels. DA-DNN predicts line states and passes them through a differentiable DC-OPF layer, using the resulting generation cost as the loss function so that all physical network constraints are enforced throughout training and inference. In addition, we adopt a customized weight-bias initialization that keeps every forward pass feasible from the first iteration, which allows stable learning on large grids. Once trained, the proposed DA-DNN produces a provably feasible topology and dispatch pair in the same time as solving the DCOPF, whereas conventional mixed-integer solvers become intractable. As a result, the proposed method successfully captures the economic advantages of OTS while maintaining scalability.
Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems
Li, Shaohan, Yang, Hao, Chen, Min, Qin, Xiaolin
The increasing frequency of extreme weather events due to global climate change urges accurate weather prediction. Recently, great advances have been made by the \textbf{end-to-end methods}, thanks to deep learning techniques, but they face limitations of \textit{representation inconsistency} in multivariable integration and struggle to effectively capture the dependency between variables, which is required in complex weather systems. Treating different variables as distinct modalities and applying a \textbf{two-stage training approach} from multimodal models can partially alleviate this issue, but due to the inconformity in training tasks between the two stages, the results are often suboptimal. To address these challenges, we propose an implicit two-stage training method, configuring separate encoders and decoders for each variable. In detailed, in the first stage, the Translator is frozen while the Encoders and Decoders learn a shared latent space, in the second stage, the Encoders and Decoders are frozen, and the Translator captures inter-variable interactions for prediction. Besides, by introducing a self-attention mechanism for multivariable fusion in the latent space, the performance achieves further improvements. Empirically, extensive experiments show the state-of-the-art performance of our method. Specifically, it reduces the MSE for near-surface air temperature and relative humidity predictions by 28.82\% and 23.39\%, respectively. The source code is available at https://github.com/ShremG/Met2Net.