Bayesian Learning
Adaptive Divergence for Rapid Adversarial Optimization
Borisyak, Maxim, Gaintseva, Tatiana, Ustyuzhanin, Andrey
Adversarial Optimization (AO) provides a reliable, practical way to match two implicitly defined distributions, one of which is usually represented by a sample of real data, and the other is defined by a generator. Typically, AO involves training of a high-capacity model on each step of the optimization. In this work, we consider computationally heavy generators, for which training of high-capacity models is associated with substantial computational costs. To address this problem, we introduce a novel family of divergences, which varies the capacity of the underlying model, and allows for a significant acceleration with respect to the number of samples drawn from the generator. We demonstrate the performance of the proposed divergences on several tasks, including tuning parameters of a physics simulator, namely, Pythia event generator.
I'm Bayesed and I know it
If you're too young to realize where the title reference comes from, I'm gonna make you lose your mind. It has something to do with parties and rocks and anthems. Actually, no, I just want you to have a good time so I'll instead ask you to take a look at the title picture. I am obviously drawing your attention to both the title and picture for a reason. With the title, you might not have realized there was a "pattern" to it till I pointed it out.
3 Main Approaches to Machine Learning Models - KDnuggets
In September 2018, I published a blog about my forthcoming book on The Mathematical Foundations of Data Science. The central question we address is: How can we bridge the gap between mathematics needed for Artificial Intelligence (Deep Learning and Machine learning) with that taught in high schools (up to ages 17/18)? In this post, we present a chapter from this book called "A Taxonomy of Machine Learning Models." The book is now available for an early bird discount released as chapters. If you are interested in getting early discounted copies, please contact ajit.jaokar at feynlabs.ai.
An Anomaly Contribution Explainer for Cyber-Security Applications
Zhang, Xiao, Marwah, Manish, Lee, I-ta, Arlitt, Martin, Goldwasser, Dan
--In this paper we introduce Anomaly Contribution Explainer or ACE, a tool to explain security anomaly detection models in terms of the model features through a regression framework, and its variant, ACE-KL, which highlights the important anomaly contributors. ACE and ACE-KL provide insights in diagnosing which attributes significantly contribute to an anomaly by building a specialized linear model to locally approximate the anomaly score that a black-box model generates. We conducted experiments with these anomaly detection models to detect security anomalies on both synthetic data and real data. In particular, we evaluate performance on three public data sets: CERT insider threat, netflow logs, and Android malware. The experimental results are encouraging: our methods consistently identify the correct contributing feature in the synthetic data where ground truth is available; similarly, for real data sets, our methods point a security analyst in the direction of the underlying causes of an anomaly, including in one case leading to the discovery of previously overlooked network scanning activity. We have made our source code publicly available. Cyber-security is a key concern for both private and public organizations, given the high cost of security compromises and attacks; malicious cyber-activity cost the U.S. economy between $57 billion and $109 billion in 2016 [1]. As a result, spending on security research and development, and security products and services to detect and combat cyber-attacks has been increasing [2]. Organizations produce large amounts of network, host and application data that can be used to gain insights into cyber-security threats, misconfigurations, and network operations. While security domain experts can manually sift through some amount of data to spot attacks and understand them, it is virtually impossible to do so at scale, considering that even a medium sized enterprise can produce terabytes of data in a few hours.
Dis-entangling Mixture of Interventions on a Causal Bayesian Network Using Aggregate Observations
Sinha, Gaurav, Chauhan, Ayush, Maiti, Aurghya, Poddar, Naman, Goel, Pulkit
We study the problem of separating a mixture of distributions, all of which come from interventions on a known causal bayesian network. Given oracle access to marginals of all distributions resulting from interventions on the network, and estimates of marginals from the mixture distribution, we want to recover the mixing proportions of different mixture components. We show that in the worst case, mixing proportions cannot be identified using marginals only. If exact marginals of the mixture distribution were known, under a simple assumption of excluding a few distributions from the mixture, we show that the mixing proportions become identifiable. Our identifiability proof is constructive and gives an efficient algorithm recovering the mixing proportions exactly. When exact marginals are not available, we design an optimization framework to estimate the mixing proportions. Our problem is motivated from a real-world scenario of an e-commerce business, where multiple interventions occur at a given time, leading to deviations in expected metrics. We conduct experiments on the well known publicly available ALARM network and on a proprietary dataset from a large e-commerce company validating the performance of our method.
The Likelihood Principle, the MVUE, Ghosts, Cakes and Elves
In my prior blog post, I wrote of a clever elf that could predict the outcome of a mathematically fair process roughly ninety percent of the time. Actually, it is ninety-three percent of the time and why it is ninety-three percent instead of ninety percent is also important. The purpose of the prior blog post was to illustrate the weakness of using the minimum variance unbiased estimator (MVUE) in applied finance. Nonetheless, that begs a more general question of when and why it should be used, or a Bayesian or Likelihood-based method should be applied. Fortunately, the prior blog post provides a way of looking at the problem. Fisher's Likelihood-based, Pearson and Neyman's Frequency-based and Laplace's method of inverse probability really are at odds with one another. Indeed, much of the literature of the mid-twentieth century had a polemical ring to it.
Learning and Planning for Time-Varying MDPs Using Maximum Likelihood Estimation
This paper proposes a formal approach to learning and planning for agents operating in a priori unknown, time-varying environments. The proposed method computes the maximally likely model of the environment, given the observations about the environment made by an agent earlier in the system run and assuming knowledge of a bound on the maximal rate of change of system dynamics. Such an approach generalizes the estimation method commonly used in learning algorithms for unknown Markov decision processes with time-invariant transition probabilities, but is also able to quickly and correctly identify the system dynamics following a change. Based on the proposed method, we generalize the exploration bonuses used in learning for time-invariant Markov decision processes by introducing a notion of uncertainty in a learned time-varying model, and develop a control policy for time-varying Markov decision processes based on the exploitation and exploration trade-off. We demonstrate the proposed methods on four numerical examples: a patrolling task with a change in system dynamics, a two-state MDP with periodically changing outcomes of actions, a wind flow estimation task, and a multi-arm bandit problem with periodically changing probabilities of different rewards.
Financial Time Series Forecasting with Deep Learning : A Systematic Literature Review: 2005-2019
Sezer, Omer Berat, Gudelek, Mehmet Ugur, Ozbayoglu, Ahmet Murat
Financial time series forecasting is, without a doubt, the top choice of computational intelligence for finance researchers from both academia and financial industry due to its broad implementation areas and substantial impact. Machine Learning (ML) researchers came up with various models and a vast number of studies have been published accordingly. As such, a significant amount of surveys exist covering ML for financial time series forecasting studies. Lately, Deep Learning (DL) models started appearing within the field, with results that significantly outperform traditional ML counterparts. Even though there is a growing interest in developing models for financial time series forecasting research, there is a lack of review papers that were solely focused on DL for finance. Hence, our motivation in this paper is to provide a comprehensive literature review on DL studies for financial time series forecasting implementations. We not only categorized the studies according to their intended forecasting implementation areas, such as index, forex, commodity forecasting, but also grouped them based on their DL model choices, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Long-Short Term Memory (LSTM). We also tried to envision the future for the field by highlighting the possible setbacks and opportunities, so the interested researchers can benefit.
A Bayesian Dynamic Multilayered Block Network Model
Rodriguez-Deniz, Hector, Villani, Mattias, Voltes-Dorta, Augusto
As network data become increasingly available, new opportunities arise to understand dynamic and multilayer network systems in many applied disciplines. Statistical modeling for multilayer networks is currently an active research area that aims to develop methods to carry out inference on such data. Recent contributions focus on latent space representation of the multilayer structure with underlying stochastic processes to account for network dynamics. Existing multilayer models are however typically limited to rather small networks. In this paper we introduce a dynamic multilayer block network model with a latent space represention for blocks rather than nodes. A block structure is natural for many real networks, such as social or transportation networks, where community structure naturally arises. A Gibbs sampler based on P\'olya-Gamma data augmentation is presented for the proposed model. Results from extensive simulations on synthetic data show that the inference algorithm scales well with the size of the network. We present a case study using real data from an airline system, a classic example of hub-and-spoke network.
A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Learning
These figures compare validation error for hyperparameter optimization of an image classification neural network with random search in grey and Bayesian Optimization (using the Tree Parzen Estimator or TPE) in green. Lower is better: a smaller validation set error generally means better test set performance, and a smaller number of trials means less time invested. Clearly, there are significant advantages to Bayesian methods, and these graphs, along with other impressive results, convinced me it was time to take the next step and learn model-based hyperparameter optimization. The one-sentence summary of Bayesian hyperparameter optimization is: build a probability model of the objective function and use it to select the most promising hyperparameters to evaluate in the true objective function. If you like to operate at a very high level, then this sentence may be all you need. However, if you want to understand the details, this article is my attempt to outline the concepts behind Bayesian optimization, in particular Sequential Model-Based Optimization (SMBO) with the Tree Parzen Estimator (TPE).