Bayesian Learning
Interval Load Forecasting for Individual Households in the Presence of Electric Vehicle Charging
Skala, Raiden, Elgalhud, Mohamed Ahmed T. A., Grolinger, Katarina, Mir, Syed
The transition to Electric Vehicles (EV) in place of traditional internal combustion engines is increasing societal demand for electricity. The ability to integrate the additional demand from EV charging into forecasting electricity demand is critical for maintaining the reliability of electricity generation and distribution. Load forecasting studies typically exclude households with home EV charging, focusing on offices, schools, and public charging stations. Moreover, they provide point forecasts which do not offer information about prediction uncertainty. Consequently, this paper proposes the Long Short-Term Memory Bayesian Neural Networks (LSTM-BNNs) for household load forecasting in presence of EV charging. The approach takes advantage of the LSTM model to capture the time dependencies and uses the dropout layer with Bayesian inference to generate prediction intervals. Results show that the proposed LSTM-BNNs achieve accuracy similar to point forecasts with the advantage of prediction intervals. Moreover, the impact of lockdowns related to the COVID-19 pandemic on the load forecasting model is examined, and the analysis shows that there is no major change in the model performance as, for the considered households, the randomness of the EV charging outweighs the change due to pandemic.
Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning
Richards, Jordan, Huser, Raphaël, Bevacqua, Emanuele, Zscheischler, Jakob
Extreme wildfires are a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, we must identify the main drivers of extreme wildfires and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. We analyse the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020, and identify high fire activity during this period in Algeria, Italy and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land cover usage, and orography. To model the complex relationships between the predictor variables and wildfires, we use a hybrid statistical deep-learning framework that can disentangle the effects of vapour-pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that whilst VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects wildfire spread. To gain insights into the effect of climate trends on wildfires in the near future, we focus on August 2001 and perturb temperature according to its observed trends (median over Europe: +0.04K per year). We find that, on average over Europe, these trends lead to a relative increase of 17.1\% and 1.6\% in the expected frequency and severity, respectively, of wildfires in August 2001, with spatially non-uniform changes in both aspects.
Uncertainty in Natural Language Processing: Sources, Quantification, and Applications
Hu, Mengting, Zhang, Zhen, Zhao, Shiwan, Huang, Minlie, Wu, Bingzhe
As a main field of artificial intelligence, natural language processing (NLP) has achieved remarkable success via deep neural networks. Plenty of NLP tasks have been addressed in a unified manner, with various tasks being associated with each other through sharing the same paradigm. However, neural networks are black boxes and rely on probability computation. Making mistakes is inevitable. Therefore, estimating the reliability and trustworthiness (in other words, uncertainty) of neural networks becomes a key research direction, which plays a crucial role in reducing models' risks and making better decisions. Therefore, in this survey, we provide a comprehensive review of uncertainty-relevant works in the NLP field. Considering the data and paradigms characteristics, we first categorize the sources of uncertainty in natural language into three types, including input, system, and output. Then, we systemically review uncertainty quantification approaches and the main applications. Finally, we discuss the challenges of uncertainty estimation in NLP and discuss potential future directions, taking into account recent trends in the field. Though there have been a few surveys about uncertainty estimation, our work is the first to review uncertainty from the NLP perspective.
Deep Learning From Crowdsourced Labels: Coupled Cross-entropy Minimization, Identifiability, and Regularization
Ibrahim, Shahana, Nguyen, Tri, Fu, Xiao
Using noisy crowdsourced labels from multiple annotators, a deep learning-based end-to-end (E2E) system aims to learn the label correction mechanism and the neural classifier simultaneously. To this end, many E2E systems concatenate the neural classifier with multiple annotator-specific ``label confusion'' layers and co-train the two parts in a parameter-coupled manner. The formulated coupled cross-entropy minimization (CCEM)-type criteria are intuitive and work well in practice. Nonetheless, theoretical understanding of the CCEM criterion has been limited. The contribution of this work is twofold: First, performance guarantees of the CCEM criterion are presented. Our analysis reveals for the first time that the CCEM can indeed correctly identify the annotators' confusion characteristics and the desired ``ground-truth'' neural classifier under realistic conditions, e.g., when only incomplete annotator labeling and finite samples are available. Second, based on the insights learned from our analysis, two regularized variants of the CCEM are proposed. The regularization terms provably enhance the identifiability of the target model parameters in various more challenging cases. A series of synthetic and real data experiments are presented to showcase the effectiveness of our approach.
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models
Lin, Alexander, Tolooshams, Bahareh, Atchadé, Yves, Ba, Demba
Latent Gaussian models have a rich history in statistics and machine learning, with applications ranging from factor analysis to compressed sensing to time series analysis. The classical method for maximizing the likelihood of these models is the expectation-maximization (EM) algorithm. For problems with high-dimensional latent variables and large datasets, EM scales poorly because it needs to invert as many large covariance matrices as the number of data points. We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
Causal Discovery using Bayesian Model Selection
Dhir, Anish, van der Wilk, Mark
With only observational data on two variables, and without other assumptions, it is not possible to infer which one causes the other. Much of the causal literature has focused on guaranteeing identifiability of causal direction in statistical models for datasets where strong assumptions hold, such as additive noise or restrictions on parameter count. These methods are then subsequently tested on realistic datasets, most of which violate their assumptions. Building on previous attempts, we show how to use causal assumptions within the Bayesian framework. This allows us to specify models with realistic assumptions, while also encoding independent causal mechanisms, leading to an asymmetry between the causal directions. Identifying causal direction then becomes a Bayesian model selection problem. We analyse why Bayesian model selection works for known identifiable cases and flexible model classes, while also providing correctness guarantees about its behaviour. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions showing the usefulness of our method.
Local Boosting for Weakly-Supervised Learning
Zhang, Rongzhi, Yu, Yue, Shen, Jiaming, Cui, Xiquan, Zhang, Chao
Boosting is a commonly used technique to enhance the performance of a set of base models by combining them into a strong ensemble model. Though widely adopted, boosting is typically used in supervised learning where the data is labeled accurately. However, in weakly supervised learning, where most of the data is labeled through weak and noisy sources, it remains nontrivial to design effective boosting approaches. In this work, we show that the standard implementation of the convex combination of base learners can hardly work due to the presence of noisy labels. Instead, we propose $\textit{LocalBoost}$, a novel framework for weakly-supervised boosting. LocalBoost iteratively boosts the ensemble model from two dimensions, i.e., intra-source and inter-source. The intra-source boosting introduces locality to the base learners and enables each base learner to focus on a particular feature regime by training new base learners on granularity-varying error regions. For the inter-source boosting, we leverage a conditional function to indicate the weak source where the sample is more likely to appear. To account for the weak labels, we further design an estimate-then-modify approach to compute the model weights. Experiments on seven datasets show that our method significantly outperforms vanilla boosting methods and other weakly-supervised methods.
MM-DAG: Multi-task DAG Learning for Multi-modal Data -- with Application for Traffic Congestion Analysis
Lan, Tian, Li, Ziyue, Li, Zhishuai, Bai, Lei, Li, Man, Tsung, Fugee, Ketter, Wolfgang, Zhao, Rui, Zhang, Chen
This paper proposes to learn Multi-task, Multi-modal Direct Acyclic Graphs (MM-DAGs), which are commonly observed in complex systems, e.g., traffic, manufacturing, and weather systems, whose variables are multi-modal with scalars, vectors, and functions. This paper takes the traffic congestion analysis as a concrete case, where a traffic intersection is usually regarded as a DAG. In a road network of multiple intersections, different intersections can only have some overlapping and distinct variables observed. For example, a signalized intersection has traffic light-related variables, whereas unsignalized ones do not. This encourages the multi-task design: with each DAG as a task, the MM-DAG tries to learn the multiple DAGs jointly so that their consensus and consistency are maximized. To this end, we innovatively propose a multi-modal regression for linear causal relationship description of different variables. Then we develop a novel Causality Difference (CD) measure and its differentiable approximator. Compared with existing SOTA measures, CD can penalize the causal structural difference among DAGs with distinct nodes and can better consider the uncertainty of causal orders. We rigidly prove our design's topological interpretation and consistency properties. We conduct thorough simulations and one case study to show the effectiveness of our MM-DAG. The code is available under https://github.com/Lantian72/MM-DAG
Enhancing naive classifier for positive unlabeled data based on logistic regression approach
Płatek, Mateusz, Mielniczuk, Jan
We argue that for analysis of Positive Unlabeled (PU) data under Selected Completely At Random (SCAR) assumption it is fruitful to view the problem as fitting of misspecified model to the data. Namely, we show that the results on misspecified fit imply that in the case when posterior probability of the response is modelled by logistic regression, fitting the logistic regression to the observable PU data which {\it does not} follow this model, still yields the vector of estimated parameters approximately colinear with the true vector of parameters. This observation together with choosing the intercept of the classifier based on optimisation of analogue of F1 measure yields a classifier which performs on par or better than its competitors on several real data sets considered.
Gibbs Sampling the Posterior of Neural Networks
Piccioli, Giovanni, Troiani, Emanuele, Zdeborová, Lenka
In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. The Gibbs sampler attains similar performances as the state-of-the-art Monte Carlo Markov chain methods, such as the Hamiltonian Monte Carlo or the Metropolis adjusted Langevin algorithm, both on real and synthetic data. By framing our analysis in the teacher-student setting, we introduce a thermalization criterion that allows us to detect when an algorithm, when run on data with synthetic labels, fails to sample from the posterior. The criterion is based on the fact that in the teacher-student setting we can initialize an algorithm directly at equilibrium.