Papacharalampous, Georgia, Langousis, Andreas

Machine and statistical learning algorithms can be reliably automated and applied at scale. Therefore, they can constitute a considerable asset for designing practical forecasting systems, such as those related to urban water demand. Quantile regression algorithms are statistical and machine learning algorithms that can provide probabilistic forecasts in a straightforward way, and have not been applied so far for urban water demand forecasting. In this work, we aim to fill this gap by automating and extensively comparing several quantile-regression-based practical systems for probabilistic one-day ahead urban water demand forecasting. For designing the practical systems, we use five individual algorithms (i.e., the quantile regression, linear boosting, generalized random forest, gradient boosting machine and quantile regression neural network algorithms), their mean combiner and their median combiner. The comparison is conducted by exploiting a large urban water flow dataset, as well as several types of hydrometeorological time series (which are considered as exogenous predictor variables in the forecasting setting). The results mostly favour the practical systems designed using the linear boosting algorithm, probably due to the presence of trends in the urban water flow time series. The forecasts of the mean and median combiners are also found to be skilful in general terms.

Vasseur, Sebastián Pérez, Aznarte, José L.

In order to take preventive steps to maintain air quality, forecasting the evolution of pollution levels becomes a useful tool for decision makers: detecting pollution peaks beforehand could give cities enough time to take and communicate effective measures. Multiple research papers have focused on this issue and have dealt with the prediction of air quality. Bai et al. [1] describes the state of the art in this exercise and collects a range of diverse solutions applied to this problem. However, the prediction of the expected value of pollution concentrations through point-forecasting does not provide enough information about the likelihood of the pollutant levels reaching a certain threshold. Indeed, we have an estimate but we usually do not have a description of the confidence of the model nor the uncertainty in the predictions. Therefore, it is difficult to estimate the probability of the pollutant reaching above a certain threshold. The reason this probability estimation is so important is because the measures taken by cities to limit pollution (for example, limiting traffic) impact the daily routines of citizens and prove themselves to be quite unpopular. Therefore, local governments need to have an estimation of the confidence in the prediction to safely engage in those preventive measures.

We conduct a large-scale benchmark experiment aiming to advance the use of machine-learning quantile regression algorithms for probabilistic hydrological post-processing "at scale" within operational contexts. The experiment is set up using 34-year-long daily time series of precipitation, temperature, evapotranspiration and streamflow for 511 catchments over the contiguous United States. Point hydrological predictions are obtained using the Génie Rural à 4 paramètres Journalier (GR4J) hydrological model and exploited as predictor variables within quantile regression settings. Six machine-learning quantile regression algorithms and their equal-weight combiner are applied to predict conditional quantiles of the hydrological model errors. The individual algorithms are quantile regression, generalized random forests for quantile regression, generalized random forests for quantile regression emulating quantile regression forests, gradient boosting machine, model-based boosting with linear models as base learners and quantile regression neural networks.

Tyralis, Hristos, Papacharalampous, Georgia, Langousis, Andreas

Daily streamflow forecasting through data-driven approaches is traditionally performed using a single machine learning algorithm. Existing applications are mostly restricted to examination of few case studies, not allowing accurate assessment of the predictive performance of the algorithms involved. Here we propose super learning (a type of ensemble learning) by combining 10 machine learning algorithms. We apply the proposed algorithm in one-step ahead forecasting mode. For the application, we exploit a big dataset consisting of 10-year long time series of daily streamflow, precipitation and temperature from 511 basins. The super learner improves over the performance of the linear regression algorithm by 20.06%, outperforming the "hard to beat in practice" equal weight combiner. The latter improves over the performance of the linear regression algorithm by 19.21%. The best performing individual machine learning algorithm is neural networks, which improves over the performance of the linear regression algorithm by 16.73%, followed by extremely randomized trees (16.40%), XGBoost (15.92%), loess (15.36%), random forests (12.75%), polyMARS (12.36%), MARS (4.74%), lasso (0.11%) and support vector regression (-0.45%). Based on the obtained large-scale results, we propose super learning for daily streamflow forecasting.

Hatalis, Kostas, Lamadrid, Alberto J., Scheinberg, Katya, Kishore, Shalinee

Uncertainty analysis in the form of probabilistic forecasting can significantly improve decision making processes in the smart power grid for better integrating renewable energy sources such as wind. Whereas point forecasting provides a single expected value, probabilistic forecasts provide more information in the form of quantiles, prediction intervals, or full predictive densities. This paper analyzes the effectiveness of a novel approach for nonparametric probabilistic forecasting of wind power that combines a smooth approximation of the pinball loss function with a neural network architecture and a weighting initialization scheme to prevent the quantile cross over problem. A numerical case study is conducted using publicly available wind data from the Global Energy Forecasting Competition 2014. Multiple quantiles are estimated to form 10%, to 90% prediction intervals which are evaluated using a quantile score and reliability measures. Benchmark models such as the persistence and climatology distributions, multiple quantile regression, and support vector quantile regression are used for comparison where results demonstrate the proposed approach leads to improved performance while preventing the problem of overlapping quantile estimates.