Directed Networks
Geometric Insights into Support Vector Machine Behavior using the KKT Conditions
Carmichael, Iain, Marron, J. S.
The Support Vector Machine (SVM) is a powerful and widely used classification algorithm. Its performance is well known to be impacted by a tuning parameter which is frequently selected by cross-validation. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM in the large and small tuning parameter regimes. These insights provide perhaps unexpected relationships between SVM and naive Bayes and maximal data piling directions. We explore how characteristics of the training data affect the behavior of SVM in many cases including: balanced vs. unbalanced classes, low vs. high dimension, separable vs. non-separable data. These results present a simple explanation of SVM's behavior as a function of the tuning parameter. We also elaborate on the geometry of complete data piling directions in high dimensional space. The results proved in this paper suggest important implications for tuning SVM with cross-validation.
Stick-Breaking Variational Autoencoders
Nalisnick, Eric, Smyth, Padhraic
We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE's.
Distilling Information Reliability and Source Trustworthiness from Digital Traces
Tabibian, Behzad, Valera, Isabel, Farajtabar, Mehrdad, Song, Le, Schölkopf, Bernhard, Gomez-Rodriguez, Manuel
Online knowledge repositories typically rely on their users or dedicated editors to evaluate the reliability of their content. These evaluations can be viewed as noisy measurements of both information reliability and information source trustworthiness. Can we leverage these noisy evaluations, often biased, to distill a robust, unbiased and interpretable measure of both notions? In this paper, we argue that the temporal traces left by these noisy evaluations give cues on the reliability of the information and the trustworthiness of the sources. Then, we propose a temporal point process modeling framework that links these temporal traces to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Furthermore, we develop an efficient convex optimization procedure to learn the parameters of the model from historical traces. Experiments on real-world data gathered from Wikipedia and Stack Overflow show that our modeling framework accurately predicts evaluation events, provides an interpretable measure of information reliability and source trustworthiness, and yields interesting insights about real-world events.
Causal Inference through the Method of Direct Estimation
Ratkovic, Marc, Tingley, Dustin
The intersection of causal inference and machine learning is a rapidly advancing field. We propose a new approach, the method of direct estimation, that draws on both traditions in order to obtain nonparametric estimates of treatment effects. The approach focuses on estimating the effect of fluctuations in a treatment variable on an outcome. A tensor-spline implementation enables rich interactions between functional bases allowing for the approach to capture treatment/covariate interactions. We show how new innovations in Bayesian sparse modeling readily handle the proposed framework, and then document its performance in simulation and applied examples. Furthermore we show how the method of direct estimation can easily extend to structural estimators commonly used in a variety of disciplines, like instrumental variables, mediation analysis, and sequential g-estimation.
Naive Bayes Example using Golf Dataset
The following notebook works through a really simple example of a Naive Bayes implementation. The aim of this machine learning application is to predict whether or not to play golf based on Weather conditions. Here we are going to read in the golf.csv This will read our CSV file into a pandas data frame. As with any Data Science application, data cleansing and feature selection play a vital role.
Membership Inference Attacks against Machine Learning Models
Shokri, Reza, Stronati, Marco, Song, Congzheng, Shmatikov, Vitaly
We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.
Spectral Methods for Nonparametric Models
Tung, Hsiao-Yu Fish, Wu, Chao-Yuan, Zaheer, Manzil, Smola, Alexander J.
Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient. In particular, we derive the lower-order moments of the IBP and the HDP, propose spectral algorithms for both models, and provide reconstruction guarantees for the algorithms. For the HDP, we further show that applying hierarchical models on dataset with hierarchical structure, which can be solved with the generalized spectral HDP, produces better solutions to that of flat models regarding likelihood performance.
On Bayesian Exponentially Embedded Family for Model Order Selection
In this paper, we derive a Bayesian model order selection rule by using the exponentially embedded family method, termed Bayesian EEF. Unlike many other Bayesian model selection methods, the Bayesian EEF can use vague proper priors and improper noninformative priors to be objective in the elicitation of parameter priors. Moreover, the penalty term of the rule is shown to be the sum of half of the parameter dimension and the estimated mutual information between parameter and observed data. This helps to reveal the EEF mechanism in selecting model orders and may provide new insights into the open problems of choosing an optimal penalty term for model order selection and choosing a good prior from information theoretic viewpoints. The important example of linear model order selection is given to illustrate the algorithms and arguments. Lastly, the Bayesian EEF that uses Jeffreys prior coincides with the EEF rule derived by frequentist strategies. This shows another interesting relationship between the frequentist and Bayesian philosophies for model selection.
Combinatorial Multi-armed Bandits for Real-Time Strategy Games
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called "naive sampling", based on a variant of the Multi-armed Bandit problem called "Combinatorial Multi-armed Bandits" (CMAB). We analyze the theoretical properties of several variants of naive sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, naive sampling outperforms the other sampling strategies.
Marginal likelihood based model comparison in Fuzzy Bayesian Learning
RADITIONAL rule based fuzzy systems encode expert opinion in the form of IF-THEN rules and optimise some performance metric (typically mean squared error in predicting a data-set) to obtain the hyper-parameters of the model (like the numeric values defining the shape of the membership functions etc.) [2]-[4]. The rule base is one of the core elements driving the model and slight change in the rule base would significantly affect the performance of the fuzzy inference system. Many automated methods have been proposed where the rule base structure and parameters can be automatically determined [5]-[7]. However interpretability of such models is an issue and various methods have been proposed to alleviate the issue [8]. In the present paper however, we are interested in the actual metric for comparison between different models having different rule bases derived from expert opinion. The comparison metric can nevertheless be embedded within an automated framework to evolve the best model if so required. The most straight forward way of comparing the fuzzy rule bases is to optimise the model parameters based on the prediction error (e.g.