Country
More Efficient Off-Policy Evaluation through Regularized Targeted Learning
Bibaut, Aurélien F., Malenica, Ivana, Vlassis, Nikos, van der Laan, Mark J.
We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their $O_P(1/\sqrt{n})$ rate of convergence and characterizing their asymptotic distribution.
On Metrics to Assess the Transferability of Machine Learning Models in Non-Intrusive Load Monitoring
Klemenjak, Christoph, Faustine, Anthony, Makonin, Stephen, Elmenreich, Wilfried
To assess the performance of load disaggregation algorithms it is common practise to train a candidate algorithm on data from one or multiple households and subsequently apply cross-validation by evaluating the classification and energy estimation performance on unseen portions of the dataset derived from the same households. With an emerging discussion of transferability in Non-Intrusive Load Monitoring (NILM), there is a need for domain-specific metrics to assess the performance of NILM algorithms on new test scenarios being unseen buildings. In this paper, we discuss several metrics to assess the generalisation ability of NILM algorithms. These metrics target different aspects of performance evaluation in NILM and are meant to complement the traditional performance evaluation approach. We demonstrate how our metrics can be utilised to evaluate NILM algorithms by means of two case studies. We conduct our studies on several energy consumption datasets and take into consideration five state-of-the-art as well as four baseline NILM solutions. Finally, we formulate research challenges for future work.
Double descent in the condition number
Poggio, Tomaso, Kur, Gil, Banburski, Andrzej
In solving a system of $n$ linear equations in $d$ variables $Ax=b$, the condition number of the $n,d$ matrix $A$ measures how much errors in the data $b$ affect the solution $x$. Bounds of this type are important in many inverse problems. An example is machine learning where the key task is to estimate an underlying function from a set of measurements at random points in a high dimensional space and where low sensitivity to error in the data is a requirement for good predictive performance. Here we discuss the simple observation, which is well-known but surprisingly little quoted that when the columns of $A$ are random vectors, the condition number of $A$ is highest if $d=n$, that is when the inverse of $A$ exists. An overdetermined system ($n>d$) as well as an underdetermined system ($n
Training without training data: Improving the generalizability of automated medical abbreviation disambiguation
Skreta, Marta, Arbabi, Aryan, Wang, Jixuan, Brudno, Michael
Proceedings of Machine Learning Research XX:1-12, 2019 Machine Learning for Health (ML4H) at NeurIPS 2019 1 Training without training data: Improving the generalizability of automated medical abbreviation disambiguation* Marta Skreta 1,2 martaskreta@cs.toronto.edu Michael Brudno 1,2 brudno@cs.toronto.edu 1 University of Toronto, Department of Computer Science 2 The Hospital for Sick Children, Center for Computational Medicine 3 Vector Institute for Artifical Intelligence, Toronto, Canada Abstract Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy of abbreviation disambiguation by almost 14% on the CASI dataset and 4% on i2b2. 1. Introduction Health care practitioners typically use abbreviations when preparing clinical records, saving time and space with the cost of increased ambiguity.
Calibrated model-based evidential clustering using bootstrapping
Evidential clustering is an approach to clustering in which cluster-membership uncertainty is represented by a collection of Dempster-Shafer mass functions forming an evidential partition. In this paper, we propose to construct these mass functions by bootstrapping finite mixture models. In the first step, we compute bootstrap percentile confidence intervals for all pairwise probabilities (the probabilities for any two objects to belong to the same class). We then construct an evidential partition such that the pairwise belief and plausibility degrees approximate the bounds of the confidence intervals. This evidential partition is calibrated, in the sense that the pairwise belief-plausibility intervals contain the true probabilities "most of the time", i.e., with a probability close to the defined confidence level. This frequentist property is verified by simulation, and the practical applicability of the method is demonstrated using several real datasets.
Control-Tutored Reinforcement Learning
De Lellis, Francesco, Auletta, Fabrizia, Russo, Giovanni, De Lellis, Piero, di Bernardo, Mario
We introduce a control-tutored reinforcement learning (CTRL) algorithm. The idea is to enhance tabular learning algorithms so as to improve the exploration of the state-space, and substantially reduce learning times by leveraging some limited knowledge of the plant encoded into a tutoring model-based control strategy. We illustrate the benefits of our novel approach and its effectiveness by using the problem of controlling one or more agents to herd and contain within a goal region a set of target free-roving agents in the plane.
Normalizing Constant Estimation with Gaussianized Bridge Sampling
Department of Physics, Department of Astronomy University of California, Berkeley, CA 94720, USA and Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley, CA 94720, USA Abstract Normalizing constant (also called partition function, Bayesian evidence, or marginal likelihood) is one of the central goals of Bayesian inference, yet most of the existing methods are both expensive and inaccurate. Here we develop a new approach, starting from posterior samples obtained with a standard Markov Chain Monte Carlo (MCMC). We apply a novel Normalizing Flow (NF) approach to obtain an analytic density estimator from these samples, followed by Optimal Bridge Sampling (OBS) to obtain the normalizing constant. We compare our method which we call Gaussianized Bridge Sampling (GBS) to existing methods such as Nested Sampling (NS) and Annealed Importance Sampling (AIS) on several examples, showing our method is both significantly faster and substantially more accurate than these methods, and comes with a reliable error estimation. Keywords: Normalizing Constant, Bridge Sampling, Normalizing Flows 1. Introduction Normalizing constant, also called partition function, Bayesian evidence, or marginal likelihood, is the central object of Bayesian methodology.
Sublinear Time Numerical Linear Algebra for Structured Matrices
Shi, Xiaofei, Woodruff, David P.
We show how to solve a number of problems in numerical linear algebra, such as least squares regression, $\ell_p$-regression for any $p \geq 1$, low rank approximation, and kernel regression, in time $T(A) \poly(\log(nd))$, where for a given input matrix $A \in \mathbb{R}^{n \times d}$, $T(A)$ is the time needed to compute $A\cdot y$ for an arbitrary vector $y \in \mathbb{R}^d$. Since $T(A) \leq O(\nnz(A))$, where $\nnz(A)$ denotes the number of non-zero entries of $A$, the time is no worse, up to polylogarithmic factors, as all of the recent advances for such problems that run in input-sparsity time. However, for many applications, $T(A)$ can be much smaller than $\nnz(A)$, yielding significantly sublinear time algorithms. For example, in the overconstrained $(1+\epsilon)$-approximate polynomial interpolation problem, $A$ is a Vandermonde matrix and $T(A) = O(n \log n)$; in this case our running time is $n \cdot \poly(\log n) + \poly(d/\epsilon)$ and we recover the results of \cite{avron2013sketching} as a special case. For overconstrained autoregression, which is a common problem arising in dynamical systems, $T(A) = O(n \log n)$, and we immediately obtain $n \cdot \poly(\log n) + \poly(d/\epsilon)$ time. For kernel autoregression, we significantly improve the running time of prior algorithms for general kernels. For the important case of autoregression with the polynomial kernel and arbitrary target vector $b\in\mathbb{R}^n$, we obtain even faster algorithms. Our algorithms show that, perhaps surprisingly, most of these optimization problems do not require much more time than that of a polylogarithmic number of matrix-vector multiplications.
Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS
Liashchynskyi, Petro, Liashchynskyi, Pavlo
In this paper, we compare the three most popular algorithms for hyperparameter optimization (Grid Search, Random Search, and Genetic Algorithm) and attempt to use them for neural architecture search (NAS). We use these algorithms for building a convolutional neural network (search architecture). Experimental results on CIFAR-10 dataset further demonstrate the performance difference between compared algorithms. The comparison results are based on the execution time of the above algorithms and accuracy of the proposed models.
Coloring graph neural networks for node disambiguation
Dasoulas, George, Santos, Ludovic Dos, Scaman, Kevin, Virmaux, Aladin
In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks(MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability , a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish,while being state-of-the-art on benchmark graph classification datasets.