Uncertainty
Learning Hierarchical Item Categories from Implicit Feedback Data for Efficient Recommendations and Browsing
Khawar, Farhan, Zhang, Nevin L.
Searching, browsing, and recommendations are common ways in which the "choice overload" faced by users in the online marketplace can be mitigated. In this paper we propose the use of hierarchical item categories, obtained from implicit feedback data, to enable efficient browsing and recommendations. We present a method of creating hierarchical item categories from implicit feedback data only i.e., without any other information on the items like name, genre etc. Categories created in this fashion are based on users' co-consumption of items. Thus, they can be more useful for users in finding interesting and relevant items while they are browsing through the hierarchy. We also show that this item hierarchy can be useful in making category based recommendations, which makes the recommendations more explainable and increases the diversity of the recommendations without compromising much on the accuracy. Item hierarchy can also be useful in the creation of an automatic item taxonomy skeleton by bypassing manual labeling and annotation. This can especially be useful for small vendors. Our data-driven hierarchical categories are based on hierarchical latent tree analysis (HLTA) which has been previously used for text analysis. We present a scaled up learning algorithm \emph{HLTA-Forest} so that HLTA can be applied to implicit feedback data.
Conditional probability calculation using restricted Boltzmann machine with application to system identification
There are many advantages to use probability method for nonlinear system identification, such as the noises and outliers in the data set do not affect the probability models significantly; the input features can be extracted in probability forms. The biggest obstacle of the probability model is the probability distributions are not easy to be obtained. In this paper, we form the nonlinear system identification into solving the conditional probability. Then we modify the restricted Boltzmann machine (RBM), such that the joint probability, input distribution, and the conditional probability can be calculated by the RBM training. Binary encoding and continue valued methods are discussed. The universal approximation analysis for the conditional probability based modelling is proposed. We use two benchmark nonlinear systems to compare our probability modelling method with the other black-box modeling methods. The results show that this novel method is much better when there are big noises and the system dynamics are complex.
A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation
Bhandari, Jalaj, Russo, Daniel, Singal, Raghav
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. A final section of the paper shows that all of our main results extend to the study of Q-learning applied to high-dimensional optimal stopping problems.
Variational Implicit Processes
Ma, Chao, Li, Yingzhen, Hernรกndez-Lobato, Josรฉ Miguel
This paper introduces the variational implicit processes (VIPs), a Bayesian nonparametric method based on a class of highly flexible priors over functions. Similar to Gaussian processes (GPs), in implicit processes (IPs), an implicit multivariate prior (data simulators, Bayesian neural networks, etc.) is placed over any finite collections of random variables. A novel and efficient variational inference algorithm for IPs is derived using wake-sleep updates, which gives analytic solutions and allows scalable hyper-parameter learning with stochastic optimization. Experiments on real-world regression datasets demonstrate that VIPs return better uncertainty estimates and superior performance over existing inference methods for GPs and Bayesian neural networks. With a Bayesian LSTM as the implicit prior, the proposed approach achieves state-of-the-art results on predicting power conversion efficiency of molecules based on raw chemical formulas.
Randomized Value Functions via Multiplicative Normalizing Flows
Touati, Ahmed, Satija, Harsh, Romoff, Joshua, Pineau, Joelle, Vincent, Pascal
Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling into local optima. In this work, we leverage recent advances in variational Bayesian neural networks and combine these with traditional Deep Q-Networks (DQN) to achieve randomized value functions for high-dimensional domains. In particular, we augment DQN with multiplicative normalizing flows in order to track an approximate posterior distribution over its parameters. This allows the agent to perform approximate Thompson sampling in a computationally efficient manner via stochastic gradient methods. We demonstrate the benefits of our approach through an empirical comparison in high dimensional environments.
Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $\beta$-Divergences
Knoblauch, Jeremias, Jewson, Jack, Damoulas, Theodoros
We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $\beta$-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $\beta \to 0$. Secondly, we give a principled way of choosing the divergence parameter $\beta$ by minimizing expected predictive loss on-line. We offer the state of the art and improve the False Discovery Rate of CPs by more than 80% on real world data.
Boosting Black Box Variational Inference
Locatello, Francesco, Dresdner, Gideon, Khanna, Rajiv, Valera, Isabel, Rรคtsch, Gunnar
Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational family. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with a greedily constructed mixture of densities. In order to guarantee convergence, previous works impose stringent assumptions that require significant effort for practitioners. Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions. Our work fixes these issues with novel theoretical and algorithmic insights. On the theoretical side, we show that boosting VI satisfies a relaxed smoothness assumption which is sufficient for the convergence of the functional Frank-Wolfe (FW) algorithm. Furthermore, we rephrase the LMO problem and propose to maximize the Residual ELBO (RELBO) which replaces the standard ELBO optimization in VI. These theoretical enhancements allow for black box implementation of the boosting subroutine. Finally, we present a stopping criterion drawn from the duality gap in the classic FW analyses and exhaustive experiments to illustrate the usefulness of our theoretical and algorithmic contributions.
Degrees of Freedom and Model Selection for kmeans Clustering
This paper investigates the problem of model selection for kmeans clustering, based on conservative estimates of the model degrees of freedom. An extension of Stein's lemma, which is used in unbiased risk estimation, is used to obtain an expression which allows one to approximate the degrees of freedom. Empirically based estimates of this approximation are obtained. The degrees of freedom estimates are then used within the popular Bayesian Information Criterion to perform model selection. The proposed estimation procedure is validated in a thorough simulation study, and the robustness is assessed through relaxations of the modelling assumptions and on data from real applications. Comparisons with popular existing techniques suggest that this approach performs extremely well when the modelling assumptions
Evidential Deep Learning to Quantify Classification Uncertainty
Sensoy, Murat, Kandemir, Melih, Kaplan, Lance
Deterministic neural nets have been shown to learn effective predictors on a wide range of machine learning problems. However, as the standard approach is to train the network to minimize a prediction loss, the resultant model remains ignorant to its prediction confidence. Orthogonally to Bayesian neural nets that indirectly infer prediction uncertainty through weight uncertainties, we propose explicit modeling of the same using the theory of subjective logic. By placing a Dirichlet prior on the softmax output, we treat predictions of a neural net as subjective opinions and learn the function that collects the evidence leading to these opinions by a deterministic neural net from data. The resultant predictor for a multi-class classification problem is another Dirichlet distribution whose parameters are set by the continuous output of a neural net. We provide a preliminary analysis on how the peculiarities of our new loss function drive improved uncertainty estimation. We observe that our method achieves unprecedented success on detection of out-of-sample queries and endurance against adversarial perturbations.
New Hybrid Neuro-Evolutionary Algorithms for Renewable Energy and Facilities Management Problems
This Ph.D. thesis deals with the optimization of several renewable energy resources development as well as the improvement of facilities management in oceanic engineering and airports, using computational hybrid methods belonging to AI to this end. Energy is essential to our society in order to ensure a good quality of life. This means that predictions over the characteristics on which renewable energies depend are necessary, in order to know the amount of energy that will be obtained at any time. The second topic tackled in this thesis is related to the basic parameters that influence in different marine activities and airports, whose knowledge is necessary to develop a proper facilities management in these environments. Within this work, a study of the state-of-the-art Machine Learning have been performed to solve the problems associated with the topics above-mentioned, and several contributions have been proposed: One of the pillars of this work is focused on the estimation of the most important parameters in the exploitation of renewable resources. The second contribution of this thesis is related to feature selection problems. The proposed methodologies are applied to multiple problems: the prediction of $H_s$, relevant for marine energy applications and marine activities, the estimation of WPREs, undesirable variations in the electric power produced by a wind farm, the prediction of global solar radiation in areas from Spain and Australia, really important in terms of solar energy, and the prediction of low-visibility events at airports. All of these practical issues are developed with the consequent previous data analysis, normally, in terms of meteorological variables.