Directed Networks
Distributionally Robust Language Modeling
Oren, Yonatan, Sagawa, Shiori, Hashimoto, Tatsunori B., Liang, Percy
Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.
Efron-Stein PAC-Bayesian Inequalities
Kuzborskij, Ilja, Szepesvรกri, Csaba
We prove semi-empirical concentration inequalities for random variables which are given as possibly nonlinear functions of independent random variables. These inequalities characterize the concentration of the random variable in terms of the data/distribution-dependent Efron-Stein (ES) estimate of its variance and they do not require any additional assumptions on the moments. In particular, this allows us to state semi-empirical Bernstein inequalities for general functions of unbounded random variables, which gives user-friendly concentration bounds for cases where related methods (entropy method / bounded differences) might be more challenging to apply. We extend these results to Efron-Stein PAC-Bayesian inequalities which hold for arbitrary probability kernels that define a random, data-dependent choice of the function of interest. Finally, we demonstrate a number of applications, including PAC-Bayesian generalization bounds for unbounded loss functions, empirical Bernstein-type generalization bounds, new truncation-free bounds for off-policy evaluation with Weighted Importance Sampling (WIS), and off-policy PAC-Bayesian learning with WIS.
Minimizing the Societal Cost of Credit Card Fraud with Limited and Imbalanced Data
Machine learning has automated much of financial fraud detection, notifying firms of, or even blocking, questionable transactions instantly. However, data imbalance starves traditionally trained models of the content necessary to detect fraud. This study examines three separate factors of credit card fraud detection via machine learning. First, it assesses the potential for different sampling methods, undersampling and Synthetic Minority Oversampling Technique (SMOTE), to improve algorithm performance in data-starved environments. Additionally, five industry-practical machine learning algorithms are evaluated on total fraud cost savings in addition to traditional statistical metrics. Finally, an ensemble of individual models is trained with a genetic algorithm to attempt to generate higher cost efficiency than its components. Monte Carlo performance distributions discerned random undersampling outperformed SMOTE in lowering fraud costs, and that an ensemble was unable to outperform its individual parts. Most notably,the F-1 Score, a traditional metric often used to measure performance with imbalanced data, was uncorrelated with derived cost efficiency. Assuming a realistic cost structure can be derived, cost-based metrics provide an essential supplement to objective statistical evaluation.
Deploying a Machine Learning Model as a REST API
As a Python developer and data scientist, I have a desire to build web apps to showcase my work. As much as I like to design the front-end, it becomes very overwhelming to take both machine learning and app development. So, I had to find a solution that could easily integrate my machine learning models with other developers who could build a robust web app better than I can. By building a REST API for my model, I could keep my code separate from other developers. There is a clear division of labor here which is nice for defining responsibilities and prevents me from directly blocking teammates who are not involved with the machine learning aspect of the project.
Stochastic quasi-Newton with line-search regularization
In this paper we present a novel quasi-Newton algorithm for use in stochastic optimisation. Quasi-Newton methods have had an enormous impact on deterministic optimisation problems because they afford rapid convergence and computationally attractive algorithms. In essence, this is achieved by learning the second-order (Hessian) information based on observing first-order gradients. We extend these ideas to the stochastic setting by employing a highly flexible model for the Hessian and infer its value based on observing noisy gradients. In addition, we propose a stochastic counterpart to standard line-search procedures and demonstrate the utility of this combination on maximum likelihood identification for general nonlinear state space models.
Data Selection for Short Term load forecasting
Pereira, Nestor, Herrera, Miguel Angel Hombrados, Gรณmez-Verdejo, Vanesssa, Mammoli, Andrea A., Martรญnez-Ramรณn, Manel
Power load forecast with Machine Learning is a fairly mature application of artificial intelligence and it is indispensable in operation, control and planning. Data selection techniqies have been hardly used in this application. However, the use of such techniques could be beneficial provided the assumption that the data is identically distributed is clearly not true in load forecasting, but it is cyclostationary. In this work we present a fully automatic methodology to determine what are the most adequate data to train a predictor which is based on a full Bayesian probabilistic model. We assess the performance of the method with experiments based on real publicly available data recorded from several years in the United States of America.
Much Needed Mathematics for Machine Learning Algorithms
Data Science, Business Analytics or Business Intelligence all of these are birds of the same nest and they have some features in common, It is safe to say that they are same same but different. One of the common features is the algorithms and models to compare, analyse and predict stuff. Some of the most commonly used machine learning algorithms with mathematics are explained as follows. Linear regression tries to represent the relationship between two variables by fitting a linear equation. Where, One variable is illustrative, and the other is supposed to be dependent.
Bayesian Machine Learning in Python: A/B Testing
Link: Bayesian Machine Learning in Python: A/B Testing Udemy In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods. These all help you solve the explore-exploit dilemma. Bestseller Created by Lazy Programmer Inc What you'll learn Use adaptive algorithms to improve A/B testing performance Understand the difference between Bayesian and frequentist statistics Apply Bayesian methods to A/B testing In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods.
DeepHealth: Deep Learning for Health Informatics
Kwak, Gloria Hyun-Jung, Hui, Pan
Machine learning and deep learning have provided us with an exploration of a whole new research era. As more data and better computational power become available, they have been implemented in various fields. The demand for artificial intelligence in the field of health informatics is also increasing and we can expect to see the potential benefits of artificial intelligence applications in healthcare. Deep learning can help clinicians diagnose disease, identify cancer sites, identify drug effects for each patient, understand the relationship between genotypes and phenotypes, explore new phenotypes, and predict infectious disease outbreaks with high accuracy. In contrast to traditional models, its approach does not require domain-specific data pre-process, and it is expected that it will ultimately change human life a lot in the future. Despite its notable advantages, there are some challenges on data (high dimensionality, heterogeneity, time dependency, sparsity, irregularity, lack of label) and model (reliability, interpretability, feasibility, security, scalability) for practical use. This article presents a comprehensive review of research applying deep learning in health informatics with a focus on the last five years in the fields of medical imaging, electronic health records, genomics, sensing, and online communication health, as well as challenges and promising directions for future research. We highlight ongoing popular approaches' research and identify several challenges in building deep learning models.
Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research
Balaprakash, Prasanna, Egele, Romain, Salim, Misha, Wild, Stefan, Vishwanath, Venkatram, Xia, Fangfang, Brettin, Tom, Stevens, Rick
Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-driven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent successes, however, designing high-performing deep learning models for nonimage and nontext cancer data is a time-consuming, trial-and-error, manual task that requires both cancer domain and deep learning expertise. To that end, we develop a reinforcement-learning-based neural architecture search to automate deep-learning-based predictive model development for a class of representative cancer data. We develop custom building blocks that allow domain experts to incorporate the cancer-data-specific characteristics. We show that our approach discovers deep neural network architectures that have significantly fewer trainable parameters, shorter training time, and accuracy similar to or higher than those of manually designed architectures. We study and demonstrate the scalability of our approach on up to 1,024 Intel Knights Landing nodes of the Theta supercomputer at the Argonne Leadership Computing Facility.