Goto

Collaborating Authors

 mape 1


GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation

Aksu, Taha, Woo, Gerald, Liu, Juncheng, Liu, Xu, Liu, Chenghao, Savarese, Silvio, Xiong, Caiming, Sahoo, Doyen

arXiv.org Machine Learning

Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. However, the advancement of these models has been hindered by the lack of comprehensive benchmarks. To address this gap, we introduce the General TIme Series ForecasTing Model Evaluation, GIFT-Eval, a pioneering benchmark aimed at promoting evaluation across diverse datasets. GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points, spanning seven domains, 10 frequencies, multivariate inputs, and prediction lengths ranging from short to long-term forecasts. To facilitate the effective pretraining and evaluation of foundation models, we also provide a non-leaking pretraining dataset containing approximately 230 billion data points. Additionally, we provide a comprehensive analysis of 17 baselines, which includes statistical models, deep learning models, and foundation models. We discuss each model in the context of various benchmark characteristics and offer a qualitative analysis that spans both deep learning and foundation models. We believe the insights from this analysis, along with access to this new standard zero-shot time series forecasting benchmark, will guide future developments in time series foundation models. The success of foundation model pretraining in language and vision modalities has catalyzed similar progress in time series forecasting. By pretraining on extensive time series datasets, a universal forecasting model can be developed, equipped to address varied downstream forecasting tasks across multiple domains, frequencies, prediction lengths, and number of variates in a zero-shot manner (Woo et al., 2024; Rasul et al., 2023; Ansari et al., 2024). A critical aspect of foundation model research is creating a high-quality benchmark that includes large, diverse evaluation data, and preferably non-leaking pretraining data to fairly evaluate models and identify their weaknesses. Research in Natural Language Processing (NLP) has produced key benchmarks such as GLUE, MMLU, etc. (Wang et al., 2018; Hendrycks et al., 2020; Srivastava et al., 2022; Chen et al., 2021), which are crucial for developing high-quality models. Unlike NLP, time series foundation models lack a unified, diverse benchmark for fair comparison. For instance, Woo et al. (2024) introduces LOTSA, which remains the largest collection of time series forecasting pre-training data to date. However, the proposed architecture, Moirai, is evaluated on existing benchmarks that are tailored to specific forecasting tasks, such as the LSF (Zhou et al., 2020) dataset for long-term forecast, and the Monash (Godahewa et al., 2021) dataset for univariate forecasts.


Artificial Intelligence and Statistical Techniques in Short-Term Load Forecasting: A Review

Nassif, Ali Bou, Soudan, Bassel, Azzeh, Mohammad, Attilli, Imtinan, AlMulla, Omar

arXiv.org Artificial Intelligence

Electrical utilities depend on short-term demand forecasting to proactively adjust production and distribution in anticipation of major variations. This systematic review analyzes 240 works published in scholarly journals between 2000 and 2019 that focus on applying Artificial Intelligence (AI), statistical, and hybrid models to short-term load forecasting (STLF). This work represents the most comprehensive review of works on this subject to date. A complete analysis of the literature is conducted to identify the most popular and accurate techniques as well as existing gaps. The findings show that although Artificial Neural Networks (ANN) continue to be the most commonly used standalone technique, researchers have been exceedingly opting for hybrid combinations of different techniques to leverage the combined advantages of individual methods. The review demonstrates that it is commonly possible with these hybrid combinations to achieve prediction accuracy exceeding 99%. The most successful duration for short-term forecasting has been identified as prediction for a duration of one day at an hourly interval. The review has identified a deficiency in access to datasets needed for training of the models. A significant gap has been identified in researching regions other than Asia, Europe, North America, and Australia.