evaluation period
Distillation and Interpretability of Ensemble Forecasts of ENSO Phase using Entropic Learning
Groom, Michael, Bassetti, Davide, Horenko, Illia, O'Kane, Terence J.
This paper introduces a distillation framework for an ensemble of entropy-optimal Sparse Probabilistic Approximation (eSPA) models, trained exclusively on satellite-era observational and reanalysis data to predict ENSO phase up to 24 months in advance. While eSPA ensembles yield state-of-the-art forecast skill, they are harder to interpret than individual eSPA models. We show how to compress the ensemble into a compact set of "distilled" models by aggregating the structure of only those ensemble members that make correct predictions. This process yields a single, diagnostically tractable model for each forecast lead time that preserves forecast performance while also enabling diagnostics that are impractical to implement on the full ensemble. An analysis of the regime persistence of the distilled model "superclusters", as well as cross-lead clustering consistency, shows that the discretised system accurately captures the spatiotemporal dynamics of ENSO. By considering the effective dimension of the feature importance vectors, the complexity of the input space required for correct ENSO phase prediction is shown to peak when forecasts must cross the boreal spring predictability barrier. Spatial importance maps derived from the feature importance vectors are introduced to identify where predictive information resides in each field and are shown to include known physical precursors at certain lead times. Case studies of key events are also presented, showing how fields reconstructed from distilled model centroids trace the evolution from extratropical and inter-basin precursors to the mature ENSO state. Overall, the distillation framework enables a rigorous investigation of long-range ENSO predictability that complements real-time data-driven operational forecasts.
- Indian Ocean (0.04)
- South America (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- (7 more...)
Sandbagging in a Simple Survival Bandit Problem
Dyer, Joel, Ornia, Daniel Jarne, Bishop, Nicholas, Calinescu, Anisoara, Wooldridge, Michael
Evaluating the safety of frontier AI systems is an increasingly important concern, helping to measure the capabilities of such models and identify risks before deployment. However, it has been recognised that if AI agents are aware that they are being evaluated, such agents may deliberately hide dangerous capabilities or intentionally demonstrate suboptimal performance in safety-related tasks in order to be released and to avoid being deactivated or retrained. Such strategic deception - often known as "sandbagging" - threatens to undermine the integrity of safety evaluations. For this reason, it is of value to identify methods that enable us to distinguish behavioural patterns that demonstrate a true lack of capability from behavioural patterns that are consistent with sandbagging. In this paper, we develop a simple model of strategic deception in sequential decision-making tasks, inspired by the recently developed survival bandit framework. We demonstrate theoretically that this problem induces sandbagging behaviour in optimal rational agents, and construct a statistical test to distinguish between sandbagging and incompetence from sequences of test scores. In simulation experiments, we investigate the reliability of this test in allowing us to distinguish between such behaviours in bandit models. This work aims to establish a potential avenue for developing robust statistical procedures for use in the science of frontier model evaluations.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.14)
- North America > United States > Alabama (0.05)
- North America > United States > Texas (0.04)
- (5 more...)
Integrating Spatiotemporal Features in LSTM for Spatially Informed COVID-19 Hospitalization Forecasting
Wang, Zhongying, Ngo, Thoai D., Zoraghein, Hamidreza, Lucas, Benjamin, Karimzadeh, Morteza
Despite the end of the pandemic phase and declining mortality rates, COVID-19 remains a significant global health concern. According to the Centers for Disease Control and Prevention (CDC) COVID-19 Dashboard, the disease exhibited a peak weekly test positivity of 18% in the U.S. in 2024. Although the recorded hospitalization rate of 4.8 per 10,000 population on August 10, 2024, may appear comparatively low, it underscores the continuing impact of the disease. According to communications received from the CDC, hospitals are mandated to report COVID-19 hospitalizations again starting in mid-November 2024, indicating the resurgence of the disease. The COVID-19 pandemic strained healthcare resources and overloaded hospitals, exacerbating the dramatic loss of human life. SARS-CoV-2 spreads rapidly, causing severe complications due to its high reproduction rate, the ability to spread via asymptomatic individuals, the prevalence of close-contact settings in densely populated areas, continual mutation into more transmissible variants, and the inconsistent application of preventive public health measures across the U.S. As a result, the demand for travel nurses surged during the pandemic, aligning with shifts in COVID-19 infection hotspots (Cole et al. 2021, Longyear et al. 2020). This was partially a geospatial problem related to the timely allocation of limited human and medical resources. Reliable geographic forecasting of COVID-19 hospital admissions could have alleviated this burden through policy-relevant decision-making and proactive allocation of resources in regional hotspots (i.e.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (17 more...)
Informer in Algorithmic Investment Strategies on High Frequency Bitcoin Data
Stefaniuk, Filip, Ślepaczuk, Robert
The article investigates the usage of Informer architecture for building automated trading strategies for high frequency Bitcoin data. Three strategies using Informer model with different loss functions: Root Mean Squared Error (RMSE), Generalized Mean Absolute Directional Loss (GMADL) and Quantile loss, are proposed and evaluated against the Buy and Hold benchmark and two benchmark strategies based on technical indicators. The evaluation is conducted using data of various frequencies: 5 minute, 15 minute, and 30 minute intervals, over the 6 different periods. Although the Informer-based model with Quantile loss did not outperform the benchmark, two other models achieved better results. The performance of the model using RMSE loss worsens when used with higher frequency data while the model that uses novel GMADL loss function is benefiting from higher frequency data and when trained on 5 minute interval it beat all the other strategies on most of the testing periods. The primary contribution of this study is the application and assessment of the RMSE, GMADL, and Quantile loss functions with the Informer model to forecast future returns, subsequently using these forecasts to develop automated trading strategies. The research provides evidence that employing an Informer model trained with the GMADL loss function can result in superior trading outcomes compared to the buy-and-hold approach.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Banking & Finance > Trading (1.00)
- Government > Regional Government > North America Government > United States Government (0.45)
Deep Reinforcement Learning and Mean-Variance Strategies for Responsible Portfolio Optimization
Acero, Fernando, Zehtabi, Parisa, Marchesotti, Nicolas, Cashmore, Michael, Magazzeni, Daniele, Veloso, Manuela
Portfolio optimization involves determining the optimal allocation of portfolio assets in order to maximize a given investment objective. Traditionally, some form of mean-variance optimization is used with the aim of maximizing returns while minimizing risk, however, more recently, deep reinforcement learning formulations have been explored. Increasingly, investors have demonstrated an interest in incorporating ESG objectives when making investment decisions, and modifications to the classical mean-variance optimization framework have been developed. In this work, we study the use of deep reinforcement learning for responsible portfolio optimization, by incorporating ESG states and objectives, and provide comparisons against modified mean-variance approaches. Our results show that deep reinforcement learning policies can provide competitive performance against mean-variance approaches for responsible portfolio allocation across additive and multiplicative utility functions of financial and ESG responsibility objectives.
- Banking & Finance > Trading (1.00)
- Energy > Oil & Gas > Upstream (0.84)
GDP nowcasting with artificial neural networks: How much does long-term memory matter?
Németh, Kristóf, Hadházi, Dániel
In our study, we apply artificial neural networks (ANNs) to nowcast quarterly GDP growth for the U.S. economy. Using the monthly FRED-MD database, we compare the nowcasting performance of five different ANN architectures: the multilayer perceptron (MLP), the one-dimensional convolutional neural network (1D CNN), the Elman recurrent neural network (RNN), the long short-term memory network (LSTM), and the gated recurrent unit (GRU). The empirical analysis presents the results from two distinctively different evaluation periods. The first (2012:Q1 -- 2019:Q4) is characterized by balanced economic growth, while the second (2012:Q1 -- 2022:Q4) also includes periods of the COVID-19 recession. According to our results, longer input sequences result in more accurate nowcasts in periods of balanced economic growth. However, this effect ceases above a relatively low threshold value of around six quarters (eighteen months). During periods of economic turbulence (e.g., during the COVID-19 recession), longer input sequences do not help the models' predictive performance; instead, they seem to weaken their generalization capability. Combined results from the two evaluation periods indicate that architectural features enabling for long-term memory do not result in more accurate nowcasts. On the other hand, the 1D CNN has proved to be a highly suitable model for GDP nowcasting. The network has shown good nowcasting performance among the competitors during the first evaluation period and achieved the overall best accuracy during the second evaluation period. Consequently, first in the literature, we propose the application of the 1D CNN for economic nowcasting.
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- North America > Canada (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Economy (1.00)
A spatiotemporal machine learning approach to forecasting COVID-19 incidence at the county level in the United States
Lucas, Benjamin, Vahedi, Behzad, Karimzadeh, Morteza
With COVID-19 affecting every country globally and changing everyday life, the ability to forecast the spread of the disease is more important than any previous epidemic. The conventional methods of disease-spread modeling, compartmental models, are based on the assumption of spatiotemporal homogeneity of the spread of the virus, which may cause forecasting to underperform, especially at high spatial resolutions. In this paper we approach the forecasting task with an alternative technique - spatiotemporal machine learning. We present COVID-LSTM, a data-driven model based on a Long Short-term Memory deep learning architecture for forecasting COVID-19 incidence at the county-level in the US. We use the weekly number of new positive cases as temporal input, and hand-engineered spatial features from Facebook movement and connectedness datasets to capture the spread of the disease in time and space. COVID-LSTM outperforms the COVID-19 Forecast Hub's Ensemble model (COVIDhub-ensemble) on our 17-week evaluation period, making it the first model to be more accurate than the COVIDhub-ensemble over one or more forecast periods. Over the 4-week forecast horizon, our model is on average 50 cases per county more accurate than the COVIDhub-ensemble. We highlight that the underutilization of data-driven forecasting of disease spread prior to COVID-19 is likely due to the lack of sufficient data available for previous diseases, in addition to the recent advances in machine learning methods for spatiotemporal forecasting. We discuss the impediments to the wider uptake of data-driven forecasting, and whether it is likely that more deep learning-based models will be used in the future.
- North America > United States > Colorado > Boulder County > Boulder (0.28)
- North America > United States > California > Los Angeles County (0.04)
- South America > Brazil (0.04)
- (18 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices
Klein, Nadja, Smith, Michael Stanley, Nott, David J.
Recurrent neural networks (RNNs) with rich feature vectors of past values can provide accurate point forecasts for series that exhibit complex serial dependence. We propose two approaches to constructing deep time series probabilistic models based on a variant of RNN called an echo state network (ESN). The first is where the output layer of the ESN has stochastic disturbances and a shrinkage prior for additional regularization. The second approach employs the implicit copula of an ESN with Gaussian disturbances, which is a deep copula process on the feature space. Combining this copula with a non-parametrically estimated marginal distribution produces a deep distributional time series model. The resulting probabilistic forecasts are deep functions of the feature vector and also marginally calibrated. In both approaches, Bayesian Markov chain Monte Carlo methods are used to estimate the models and compute forecasts. The proposed deep time series models are suitable for the complex task of forecasting intraday electricity prices. Using data from the Australian National Electricity Market, we show that our models provide accurate probabilistic price forecasts. Moreover, the models provide a flexible framework for incorporating probabilistic forecasts of electricity demand as additional features. We demonstrate that doing so in the deep distributional time series model in particular, increases price forecast accuracy substantially.
- Oceania > Australia > New South Wales (0.04)
- Asia > Singapore (0.04)
- South America > Chile (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
New Findings Show Artificial Intelligence Software Improves Breast Cancer Detection and Physician Accuracy
A New York City based large volume private practice radiology group conducted a quality assurance review that included an 18 month software evaluation in the breast center comprised of nine (9) specialist radiologists using an FDA cleared artificial intelligence software by Koios Medical, Inc as a second opinion for analyzing and assessing lesions found during breast ultrasound examinations. Over the evaluation period, radiologists analyzed over 6,000 diagnostic breast ultrasound exams. Radiologists used Koios DS Breast decision support software (Koios Medical, Inc.) to assist in lesion classification and risk assessment. As part of the normal diagnostic workflow, radiologists would activate Koios DS and review the software findings with clinical details to formulate the best management. Analysis was then performed comparing the physicians' diagnostic performance to the 18-month period prior to the introduction of the AI enabled software.
- North America > United States > New York (0.26)
- North America > United States > Illinois > Cook County > Chicago (0.06)
- Press Release (0.87)
- Research Report > New Finding (0.85)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Government > Regional Government > North America Government > United States Government > FDA (0.58)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.53)