bernoulli model
Reviews: Adversarially Robust Generalization Requires More Data
The paper considered theoretical results on adversarially robust generalization, which studies the robustness of classifiers in the presence of even small noise. In particular, the work studied the generalization of adversarially robust learning by investigating the sample complexity in a comparison to that of standard learning. Specifically, the study focused on two simple concrete distribution models: gaussian model and Bernoulli model. For both models, the authors established the lower and upper bounds for the sample complexities. From these results, they drew the conclusion that the sample complexity of robust generalization is much larger than standard generalization.
Probabilistic Modeling for Sequences of Sets in Continuous-Time
Chang, Yuxin, Boyd, Alex, Smyth, Padhraic
Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.
A majorization-minimization algorithm for nonnegative binary matrix factorization
Magron, Paul, Fรฉvotte, Cรฉdric
This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
Probabilistic Embeddings with Laplacian Graph Priors
Yrjรคnรคinen, Vรคinรถ, Magnusson, Mรฅns
Probabilistic word embedding models have emerged as a way to model textual data in scientific applications [Rudolph We introduce probabilistic embeddings using et al., 2016, Bamler and Mandt, 2017]. The advantages include Laplacian priors (PELP). The proposed model enables flexible inclusion of prior knowledge, explicit handling incorporating graph side-information into of uncertainty, straightforward estimation, and usefulness static word embeddings. We theoretically show in decision-making [Ghahramani, 2015]. More versatile that the model unifies several previously proposed and flexible word embedding methods allow for increasingly embedding methods under one umbrella.
Dual Confidence Regions: A Simple Introduction - DataScienceCentral.com
This tutorial explains how to build confidence regions (the 2D version of a confidence interval) using as little statistical theory as possible. I also avoid the traditional terminology and notation such as ฮฑ, Z1-ฮฑ, critical value, confidence level, significance level and so on. These can be confusing to beginners and professionals alike. Instead, I use simulations and two keywords only: confidence region, and confidence level. The purpose is to explain the concept using a framework that will appeal to machine learning professionals, software engineers and non-statisticians.
More Data Can Expand the Generalization Gap Between Adversarially Robust and Standard Models
Chen, Lin, Min, Yifei, Zhang, Mingrui, Karbasi, Amin
As modern machine learning models continue to gain traction in the real world, a wide variety of novel problems have come to the forefront of the research community. One particularly important challenge has been that of adversarial attacks (Szegedy et al., 2013; Goodfellow et al., 2014; Kos et al., 2018; Carlini & Wagner, 2018). To be specific, given a model with excellent performance on a standard data set, one can add small perturbations to the test data that can fool the model and cause it to make wrong predictions. What is more worrying is that these small perturbations can possibly be designed to be imperceptible to human beings, which raises concerns about potential safety issues and risks, especially when it comes to applications such as autonomous vehicles where human lives are at stake. The problem of adversarial robustness in machine learning models has been explored from several different perspectives since its discovery. One direction has been to propose attacks that challenge these models and their training procedures (Carlini & Wagner, 2017; Gu & Rigazio, 2014; Athalye et al., 2018; Papernot et al., 2016a; Moosavi-Dezfooli et al., 2016).
Subspace Tracking from Missing and Outlier Corrupted Data
Narayanamurthy, Praneeth, Daneshpajooh, Vahid, Vaswani, Namrata
We study the related problems of subspace tracking in the presence of missing data (ST-miss) as well as robust subspace tracking with missing data (RST-miss). Here "robust" refers to robustness to sparse outliers. In recent work, we have studied the RST problem without missing data. In this work, we show that simple modifications of our solution approach for RST also provably solve ST-miss and RST-miss under weaker and similar assumptions respectively. To our knowledge, our result is the first complete guarantee for both ST-miss and RST-miss. This means we are able to show that, under assumptions on only the algorithm inputs (input data and/or initialization), the output subspace estimates are close to the true data subspaces at all times. Our guarantees hold under mild and easily interpretable assumptions and handle time-varying subspaces (unlike all previous work). We also show that our algorithm and its extensions are fast and have competitive experimental performance when compared with existing methods.
Reliable Learning of Bernoulli Mixture Models
Najafi, Amir, Motahari, Abolfazl, Rabiee, Hamid R.
In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.
High Dimensional Low Rank plus Sparse Matrix Decomposition
Rahmani, Mostafa, Atia, George
This paper is concerned with the problem of low rank plus sparse matrix decomposition for big data. Conventional algorithms for matrix decomposition use the entire data to extract the low-rank and sparse components, and are based on optimization problems with complexity that scales with the dimension of the data, which limits their scalability. Furthermore, existing randomized approaches mostly rely on uniform random sampling, which is quite inefficient for many real world data matrices that exhibit additional structures (e.g. clustering). In this paper, a scalable subspace-pursuit approach that transforms the decomposition problem to a subspace learning problem is proposed. The decomposition is carried out using a small data sketch formed from sampled columns/rows. Even when the data is sampled uniformly at random, it is shown that the sufficient number of sampled columns/rows is roughly O(r\mu), where \mu is the coherency parameter and r the rank of the low rank component. In addition, adaptive sampling algorithms are proposed to address the problem of column/row sampling from structured data. We provide an analysis of the proposed method with adaptive sampling and show that adaptive sampling makes the required number of sampled columns/rows invariant to the distribution of the data. The proposed approach is amenable to online implementation and an online scheme is proposed.
Matrix completion with column manipulation: Near-optimal sample-robustness-rank tradeoffs
Chen, Yudong, Xu, Huan, Caramanis, Constantine, Sanghavi, Sujay
This paper considers the problem of matrix completion when some number of the columns are completely and arbitrarily corrupted, potentially by a malicious adversary. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators who try to skew the predictions of the algorithm by calibrating their inputs to the system. In this paper, we develop an efficient algorithm for this problem based on a combination of a trimming procedure and a convex program that minimizes the nuclear norm and the $\ell_{1,2}$ norm. Our theoretical results show that given a vanishing fraction of observed entries, it is nevertheless possible to complete the underlying matrix even when the number of corrupted columns grows. Significantly, our results hold without any assumptions on the locations or values of the observed entries of the manipulated columns. Moreover, we show by an information-theoretic argument that our guarantees are nearly optimal in terms of the fraction of sampled entries on the authentic columns, the fraction of corrupted columns, and the rank of the underlying matrix. Our results therefore sharply characterize the tradeoffs between sample, robustness and rank in matrix completion.