Directed Networks
6. Machine Learning Algorithms -- Python 3: from None to Machine Learning
Algorithms are often grouped by similarity in terms of their function (how they work). For example, tree-based methods, and neural network inspired methods. I think this is the most useful way to group algorithms and it is the approach we will use here. This is a useful grouping method, but it is not perfect. There are still algorithms that could just as easily fit into multiple categories like Learning Vector Quantization that is both a neural network inspired method and an instance-based method.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
Zintgraf, Luisa, Shiarlis, Kyriacos, Igl, Maximilian, Schulze, Sebastian, Gal, Yarin, Hofmann, Katja, Whiteson, Shimon
V ARIBAD: A V ERY G OOD M ETHOD FOR B AYES-A DAPTIVE D EEP RL VIA M ETA-L EARNING Luisa Zintgraf University of Oxford Kyriacos Shiarlis Latent Logic Maximilian Igl University of Oxford Sebastian Schulze University of Oxford Y arin Gal OA TML Group, University of Oxford Katja Hofmann Microsoft Research Shimon Whiteson University of Oxford Latent Logic A BSTRACT Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods. 1 I NTRODUCTION Reinforcement learning (RL) is typically concerned with finding an optimal policy that maximises expected return for a given Markov decision process (MDP) with an unknown reward and transition function. If these were known, the optimal policy could in theory be computed without interacting with the environment. By contrast, learning in an unknown environment typically requires trading off exploration (learning about the environment) and exploitation (taking promising actions). Balancing this tradeoff is key to maximising expected return during learning . A Bayes-optimal policy, which does so optimally, conditions actions not only on the environment state but on the agent's own uncertainty about the current MDP . In principle, a Bayes-optimal policy can be computed using the framework of Bayes-adaptive Markov decision processes (BAMDPs) (Martin, 1967; Duff & Barto, 2002). The agent maintains a belief, i.e., a posterior distribution, over possible environments. Augmenting the state space of the underlying MDP with this posterior distribution yields a BAMDP, a special case of a belief MDP (Kaelbling et al., 1998).
Coupling Oceanic Observation Systems to Study Mesoscale Ocean Dynamics
Cosne, Gautier, Maze, Guillaume, Tandeo, Pierre
Understanding local currents in the North Atlantic region of the ocean is a key part of modelling heat transfer and global climate patterns. Satellites provide a surface signature of the temperature of the ocean with a high horizontal resolution while in situ autonomous probes supply high vertical resolution, but horizontally sparse, knowledge of the ocean interior thermal structure. The objective of this paper is to develop a methodology to combine these complementary ocean observing systems measurements to obtain a three-dimensional time series of ocean temperatures with high horizontal and vertical resolution. Within an observation-driven framework, we investigate the extent to which mesoscale ocean dynamics in the North Atlantic region may be decomposed into a mixture of dynamical modes, characterized by different local regressions between Sea Surface Temperature (SST), Sea Level Anomalies (SLA) and Vertical Temperature fields. Ultimately we propose a Latent-class regression method to improve prediction of vertical ocean temperature.
Adversarial Regression. Generative Adversarial Networks for Non-Linear Regression: Theory and Assessment
Adversarial Regression is a proposition to perform high dimensional non-linear regression with uncertainty estimation. We used Conditional Generative Adversarial Network to obtain an estimate of the full predictive distribution for a new observation. Generative Adversarial Networks (GAN) are implicit generative models which produce samples from a distribution approximating the distribution of the data. The conditional version of it (CGAN) takes the following expression: $\min\limits_G \max\limits_D V(D, G) = \mathbb{E}_{x\sim p_{r}(x)} [log(D(x, y))] + \mathbb{E}_{z\sim p_{z}(z)} [log (1-D(G(z, y)))]$. An approximate solution can be found by training simultaneously two neural networks to model D and G and feeding G with a random noise vector $z$. After training, we have that $G(z, y)\mathrel{\dot\sim} p_{data}(x, y)$. By fixing $y$, we have $G(z|y) \mathrel{\dot\sim} p{data}(x|y)$. By sampling $z$, we can therefore obtain samples following approximately $p(x|y)$, which is the predictive distribution of $x$ for a new $y$. We ran experiments to test various loss functions, data distributions, sample size, size of the noise vector, etc. Even if we observed differences, no experiment outperformed consistently the others. The quality of CGAN for regression relies on fine-tuning a range of hyperparameters. In a broader view, the results show that CGANs are very promising methods to perform uncertainty estimation for high dimensional non-linear regression.
Masked Gradient-Based Causal Structure Learning
Ng, Ignavier, Fang, Zhuangyan, Zhu, Shengyu, Chen, Zhitang
Learning causal graphical models based on directed acyclic graphs is an important task in causal discovery and causal inference. We consider a general framework towards efficient causal structure learning with potentially large graphs. Within this framework, we propose a masked gradient-based structure learning method based on binary adjacency matrix that exists for any structural equation model. To enable first-order optimization methods, we use Gumbel-Softmax approach to approximate the binary valued entries of the adjacency matrix, which usually results in real values that are close to zero or one. The proposed method can readily include any differentiable score function and model function for learning causal structures. Experiments on both synthetic and real-world datasets are conducted to show the effectiveness of our approach.
Privacy-preserving Federated Bayesian Learning of a Generative Model for Imbalanced Classification of Clinical Data
In clinical research, the lack of events of interest often necessitates imbalanced learning. One approach to resolve this obstacle is data integration or sharing, but due to privacy concerns neither is practical. Therefore, there is an increasing demand for a platform on which an analysis can be performed in a federated environment while maintaining privacy. However, it is quite challenging to develop a federated learning algorithm that can address both privacy-preserving and class imbalanced issues. In this study, we introduce a federated generative model learning platform for generating samples in a data-distributed environment while preserving privacy. We specifically propose approximate Bayesian computation-based Gaussian Mixture Model called 'Federated ABC-GMM', which can oversample data in a minor class by estimating the posterior distribution of model parameters across institutions in a privacy-preserving manner. PhysioNet2012, a dataset for prediction of mortality of patients in an Intensive Care Unit (ICU), was used to verify the performance of the proposed method. Experimental results show that our method boosts classification performance in terms of F1 score up to nearly an ideal situation. It is believed that the proposed method can be a novel alternative to solving class imbalance problems.
Continual Learning in Neural Networks
Artificial neural networks have exceeded human-level performance in accomplishing several individual tasks (e.g. voice recognition, object recognition, and video games). However, such success remains modest compared to human intelligence that can learn and perform an unlimited number of tasks. Humans' ability of learning and accumulating knowledge over their lifetime is an essential aspect of their intelligence. Continual machine learning aims at a higher level of machine intelligence through providing the artificial agents with the ability to learn online from a non-stationary and never-ending stream of data. A key component of such a never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The work described in this thesis has been dedicated to the investigation of continual learning and solutions to mitigate the forgetting phenomena in neural networks. To approach the continual learning problem, we first assume a task incremental setting where tasks are received one at a time and data from previous tasks are not stored. Since the task incremental setting can't be assumed in all continual learning scenarios, we also study the more general online continual setting. We consider an infinite stream of data drawn from a non-stationary distribution with a supervisory or self-supervisory training signal. The proposed methods in this thesis have tackled important aspects of continual learning. They were evaluated on different benchmarks and over various learning sequences. Advances in the state of the art of continual learning have been shown and challenges for bringing continual learning into application were critically identified.
Probability Learning I: Bayes' Theorem - KDnuggets
This post assumes you have some basic knowledge of probability and statistics. If you don't, do not be afraid, I have gathered a list of the best resources I could find to introduce you to these subjects, so that you can read this post, understand it, and enjoy it to its fullest. In it, we will talk about one of the most famous and utilised theorems of probability theory: Bayes' Theorem. Then you are in for a treat! Know what it is already?
A Gentle Introduction to Information Entropy
Information theory is a subfield of mathematics concerned with transmitting data across a noisy channel. A cornerstone of information theory is the idea of quantifying how much information there is in a message. More generally, this can be used to quantify the information in an event and a random variable, called entropy, and is calculated using probability. Calculating information and entropy is a useful tool in machine learning and is used as the basis for techniques such as feature selection, building decision trees, and, more generally, fitting classification models. As such, a machine learning practitioner requires a strong understanding and intuition for information and entropy.
Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes
Akula, Ramya, Nguyen, Ni, Garibay, Ivan
According to the American Diabetes Association(ADA), 30.3 million people in the United States have diabetes, but only 7.2 million may be undiagnosed and unaware of their condition. Type 2 diabetes is usually diagnosed for most patients later on in life whereas the less common Type 1 diabetes is diagnosed early on in life. People can live healthy and happy lives while living with diabetes, but early detection produces a better overall outcome on most patient's health. Thus, to test the accurate prediction of Type 2 diabetes, we use the patients' information from an electronic health records company called Practice Fusion, which has about 10,000 patient records from 2009 to 2012. This data contains individual key biometrics, including age, diastolic and systolic blood pressure, gender, height, and weight. We use this data on popular machine learning algorithms and for each algorithm, we evaluate the performance of every model based on their classification accuracy, precision, sensitivity, specificity/recall, negative predictive value, and F1 score. In our study, we find that all algorithms other than Naive Bayes suffered from very low precision. Hence, we take a step further and incorporate all the algorithms into a weighted average or soft voting ensemble model where each algorithm will count towards a majority vote towards the decision outcome of whether a patient has diabetes or not. The accuracy of the Ensemble model on Practice Fusion is 85\%, by far our ensemble approach is new in this space. We firmly believe that the weighted average ensemble model not only performed well in overall metrics but also helped to recover wrong predictions and aid in accurate prediction of Type 2 diabetes. Our accurate novel model can be used as an alert for the patients to seek medical evaluation in time.