Not enough data to create a plot.
Try a different view from the menu above.
Modeling & Simulation
Rethinking Remaining Useful Life Prediction with Scarce Time Series Data: Regression under Indirect Supervision
Cheng, Jiaxiang, Pang, Yipeng, Hu, Guoqiang
Supervised time series prediction relies on directly measured target variables, but real-world use cases such as predicting remaining useful life (RUL) involve indirect supervision, where the target variable is labeled as a function of another dependent variable. Trending temporal regression techniques rely on sequential time series inputs to capture temporal patterns, requiring interpolation when dealing with sparsely and irregularly sampled covariates along the timeline. However, interpolation can introduce significant biases, particularly with highly scarce data. In this paper, we address the RUL prediction problem with data scarcity as time series regression under indirect supervision. We introduce a unified framework called parameterized static regression, which takes single data points as inputs for regression of target values, inherently handling data scarcity without requiring interpolation. The time dependency under indirect supervision is captured via a parametrical rectification (PR) process, approximating a parametric function during inference with historical posteriori estimates, following the same underlying distribution used for labeling during training. Additionally, we propose a novel batch training technique for tasks in indirect supervision to prevent overfitting and enhance efficiency. We evaluate our model on public benchmarks for RUL prediction with simulated data scarcity. Our method demonstrates competitive performance in prediction accuracy when dealing with highly scarce time series data.
End-to-end data-driven weather prediction
A new AI weather prediction system, developed by a team of researchers from the University of Cambridge, can deliver accurate forecasts which use less computing power than current AI and physics-based forecasting systems. The system, Aardvark Weather, has been supported by the Alan Turing Institute, Microsoft Research and the European Centre for Medium Range Weather Forecasts. It provides a blueprint for a new approach to weather forecasting with the potential to improve current practices. The results are reported in the journal Nature. "Aardvark reimagines current weather prediction methods offering the potential to make weather forecasts faster, cheaper, more flexible and more accurate than ever before, helping to transform weather prediction in both developed and developing countries," said Professor Richard Turner from Cambridge's Department of Engineering, who led the research.
Multi-resolution Score-Based Variational Graphical Diffusion for Causal Disaster System Modeling and Inference
Li, Xuechun, Gao, Shan, Xu, Susu
Complex systems with intricate causal dependencies challenge accurate prediction. Effective modeling requires precise physical process representation, integration of interdependent factors, and incorporation of multi-resolution observational data. These systems manifest in both static scenarios with instantaneous causal chains and temporal scenarios with evolving dynamics, complicating modeling efforts. Current methods struggle to simultaneously handle varying resolutions, capture physical relationships, model causal dependencies, and incorporate temporal dynamics, especially with inconsistently sampled data from diverse sources. We introduce Temporal-SVGDM: Score-based Variational Graphical Diffusion Model for Multi-resolution observations. Our framework constructs individual SDEs for each variable at its native resolution, then couples these SDEs through a causal score mechanism where parent nodes inform child nodes' evolution. This enables unified modeling of both immediate causal effects in static scenarios and evolving dependencies in temporal scenarios. In temporal models, state representations are processed through a sequence prediction model to predict future states based on historical patterns and causal relationships. Experiments on real-world datasets demonstrate improved prediction accuracy and causal understanding compared to existing methods, with robust performance under varying levels of background knowledge. Our model exhibits graceful degradation across different disaster types, successfully handling both static earthquake scenarios and temporal hurricane and wildfire scenarios, while maintaining superior performance even with limited data.
Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting
Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating. We present an online, multivariate, regularized distributional regression model, allowing for the modeling of all distribution parameters conditional on explanatory variables. Our approach is based on the combination of the multivariate distributional regression and an efficient online learning algorithm based on online coordinate descent for LASSO-type regularization. Additionally, we propose to regularize the estimation along a path of increasingly complex dependence structures of the multivariate distribution, allowing for parsimonious estimation and early stopping. We validate our approach through one of the first forecasting studies focusing on multivariate probabilistic forecasting in the German day-ahead electricity market while using only online estimation methods. We compare our approach to online LASSO-ARX-models with adaptive marginal distribution and to online univariate distributional models combined with an adaptive Copula. We show that the multivariate distributional regression, which allows modeling all distribution parameters - including the mean and the dependence structure - conditional on explanatory variables such as renewable in-feed or past prices provide superior forecasting performance compared to modeling of the marginals only and keeping a static/unconditional dependence structure. Additionally, online estimation yields a speed-up by a factor of 80 to over 400 times compared to batch fitting.
ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification
Duan, Yuhan, Zhao, Xin, Shi, Neng, Shen, Han-Wei
Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become unreliable and hence hinder analysis. While methods like Monte Carlo dropout and ensemble models exist, they are often costly, fail to isolate uncertainty types, and lack guaranteed coverage in prediction intervals. To address this, we introduce ConfEviSurrogate, a novel Conformalized Evidential Surrogate Model that can efficiently learn high-order evidential distributions, directly predict simulation outcomes, separate uncertainty sources, and provide prediction intervals. A conformal prediction-based calibration step further enhances interval reliability to ensure coverage and improve efficiency. Our ConfEviSurrogate demonstrates accurate predictions and robust uncertainty estimates in diverse simulations, including cosmology, ocean dynamics, and fluid dynamics.
Proper scoring rules for estimation and forecast evaluation
Waghmare, Kartik, Ziegel, Johanna
In recent years, proper scoring rules have emerged as a power ful general approach for estimating probability distributions. In addition to significantly ex panding the range of modeling techniques that can be applied in practice, this has also substantially broadened the conceptual understanding of estimation methods. Originally, proper scoring rules we re conceived in meteorology as summary statistics for describing the performance of probabilisti c forecasts ( Murphy and Winkler, 1984), but they also play an important role in economics as tools for bel ief elicitation ( Schotter and Trevino, 2014). A probabilistic forecast is a probability distribution ove r the space of the possible outcomes of the future event that is stated by the forecaster. The simple st and most popular case of probabilistic forecasts arises when the outcome is binary, so the probabilistic forecast reduces to issuing a predictive probability of success. Brier ( 1950) was the first to consider the problem of devising a scoring rule which could not be "played" by a dishonest fore casting agent. He introduced the quadratic scoring rule and showed that it incentivizes a for ecasting agent to state his most accurate probability estimate when faced with uncertainty.
Spatio-temporal Prediction of Fine-Grained Origin-Destination Matrices with Applications in Ridesharing
Yang, Run, Dai, Runpeng, Gao, Siran, Tang, Xiaocheng, Zhou, Fan, Zhu, Hongtu
Accurate spatial-temporal prediction of network-based travelers' requests is crucial for the effective policy design of ridesharing platforms. Having knowledge of the total demand between various locations in the upcoming time slots enables platforms to proactively prepare adequate supplies, thereby increasing the likelihood of fulfilling travelers' requests and redistributing idle drivers to areas with high potential demand to optimize the global supply-demand equilibrium. This paper delves into the prediction of Origin-Destination (OD) demands at a fine-grained spatial level, especially when confronted with an expansive set of local regions. While this task holds immense practical value, it remains relatively unexplored within the research community. To fill this gap, we introduce a novel prediction model called OD-CED, which comprises an unsupervised space coarsening technique to alleviate data sparsity and an encoder-decoder architecture to capture both semantic and geographic dependencies. Through practical experimentation, OD-CED has demonstrated remarkable results. It achieved an impressive reduction of up to 45% reduction in root-mean-square error and 60% in weighted mean absolute percentage error over traditional statistical methods when dealing with OD matrices exhibiting a sparsity exceeding 90%.
Concorde: Fast and Accurate CPU Performance Modeling with Compositional Analytical-ML Fusion
Nasr-Esfahany, Arash, Alizadeh, Mohammad, Lee, Victor, Alam, Hanna, Coon, Brett W., Culler, David, Dadu, Vidushi, Dixon, Martin, Levy, Henry M., Pandey, Santosh, Ranganathan, Parthasarathy, Yazdanbakhsh, Amir
Cycle-level simulators such as gem5 are widely used in microarchitecture design, but they are prohibitively slow for large-scale design space explorations. We present Concorde, a new methodology for learning fast and accurate performance models of microarchitectures. Unlike existing simulators and learning approaches that emulate each instruction, Concorde predicts the behavior of a program based on compact performance distributions that capture the impact of different microarchitectural components. It derives these performance distributions using simple analytical models that estimate bounds on performance induced by each microarchitectural component, providing a simple yet rich representation of a program's performance characteristics across a large space of microarchitectural parameters. Experiments show that Concorde is more than five orders of magnitude faster than a reference cycle-level simulator, with about 2% average Cycles-Per-Instruction (CPI) prediction error across a range of SPEC, open-source, and proprietary benchmarks. This enables rapid design-space exploration and performance sensitivity analyses that are currently infeasible, e.g., in about an hour, we conducted a first-of-its-kind fine-grained performance attribution to different microarchitectural components across a diverse set of programs, requiring nearly 150 million CPI evaluations.
SafeCast: Risk-Responsive Motion Forecasting for Autonomous Vehicles
Liao, Haicheng, Kong, Hanlin, Rao, Bin, Wang, Bonan, Wang, Chengyue, Yu, Guyang, Huang, Yuming, Tang, Ruru, Xu, Chengzhong, Li, Zhenning
Accurate motion forecasting is essential for the safety and reliability of autonomous driving (AD) systems. While existing methods have made significant progress, they often overlook explicit safety constraints and struggle to capture the complex interactions among traffic agents, environmental factors, and motion dynamics. To address these challenges, we present SafeCast, a risk-responsive motion forecasting model that integrates safety-aware decision-making with uncertainty-aware adaptability. SafeCast is the first to incorporate the Responsibility-Sensitive Safety (RSS) framework into motion forecasting, encoding interpretable safety rules--such as safe distances and collision avoidance--based on traffic norms and physical principles. To further enhance robustness, we introduce the Graph Uncertainty Feature (GUF), a graph-based module that injects learnable noise into Graph Attention Networks, capturing real-world uncertainties and enhancing generalization across diverse scenarios. We evaluate SafeCast on four real-world benchmark datasets--Next Generation Simulation (NGSIM), Highway Drone (HighD), ApolloScape, and the Macao Connected Autonomous Driving (MoCAD)--covering highway, urban, and mixed-autonomy traffic environments. Our model achieves state-of-the-art (SOTA) accuracy while maintaining a lightweight architecture and low inference latency, underscoring its potential for real-time deployment in safety-critical AD systems.
FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation
Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However, existing AI-based data assimilation methods can only handle observations with a specific resolution, lacking the compatibility and generalization ability to assimilate observations with other resolutions. Considering that complex real-world observations often have different resolutions, we propose the Fourier Neural Processes (FNP) for arbitrary-resolution data assimilation in this paper. Leveraging the efficiency of the designed modules and flexible structure of neural processes, FNP achieves state-of-the-art results in assimilating observations with varying resolutions, and also exhibits increasing advantages over the counterparts as the resolution and the amount of observations increase. Moreover, our FNP trained on a fixed resolution can directly handle the assimilation of observations with out-of-distribution resolutions and the observational information reconstruction task without additional fine-tuning, demonstrating its excellent generalization ability across data resolutions as well as across tasks.