Goto

Collaborating Authors

 Uncertainty


Parameterizing the semantics of fuzzy attribute implications by systems of isotone Galois connections

arXiv.org Artificial Intelligence

We study the semantics of fuzzy if-then rules called fuzzy attribute implications parameterized by systems of isotone Galois connections. The rules express dependencies between fuzzy attributes in object-attribute incidence data. The proposed parameterizations are general and include as special cases the parameterizations by linguistic hedges used in earlier approaches. We formalize the general parameterizations, propose bivalent and graded notions of semantic entailment of fuzzy attribute implications, show their characterization in terms of least models and complete axiomatization, and provide characterization of bases of fuzzy attribute implications derived from data.


Non-parametric Bayesian Learning with Deep Learning Structure and Its Applications in Wireless Networks

arXiv.org Machine Learning

Statistical models have been applied to the classification and prediction problems in machine learning and data analysis [1]. Some statistical methods make the hypothesis of mathematical models that are controlled by certain parameters to fit the latent structure of observed data [2]. The observed data are assumed to be generated by complex structures which have hierarchical layers and hidden causes [3]. One key challenge faced by modeling the data structure in this way is thus the determination of the numbers of layers and hidden variables. However, it is sometimes impractical and challenging to choose any fixed number for the model structure when making the hypothesis.


Bayesian matrix completion: prior specification

arXiv.org Machine Learning

Low-rank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is now well understood, an as yet unexplored avenue of research is the behaviour of Bayesian algorithms in this context. In this paper, we briefly review the priors used in the Bayesian literature for matrix completion. A standard approach is to assign an inverse gamma prior to the singular values of a certain singular value decomposition of the matrix of interest; this prior is conjugate. However, we show that two other types of priors (again for the singular values) may be conjugate for this model: a gamma prior, and a discrete prior. Conjugacy is very convenient, as it makes it possible to implement either Gibbs sampling or Variational Bayes. Interestingly enough, the maximum a posteriori for these different priors is related to the nuclear norm minimization problems. We also compare all these priors on simulated datasets, and on the classical MovieLens and Netflix datasets.


Low-Rank Modeling and Its Applications in Image Analysis

arXiv.org Machine Learning

Low-rank modeling generally refers to a class of methods that solve problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing and bioinformatics. Recently, much progress has been made in theories, algorithms and applications of low-rank modeling, such as exact low-rank matrix recovery via convex programming and matrix completion applied to collaborative filtering. These advances have brought more and more attentions to this topic. In this paper, we review the recent advance of low-rank modeling, the state-of-the-art algorithms, and related applications in image analysis. We first give an overview to the concept of low-rank modeling and challenging problems in this area. Then, we summarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitations with numerical experiments. Next, we introduce a few applications of low-rank modeling in the context of image analysis. Finally, we conclude this paper with some discussions.


Variational Reformulation of Bayesian Inverse Problems

arXiv.org Machine Learning

The classical approach to inverse problems is based on the optimization of a misfit function. Despite its computational appeal, such an approach suffers from many shortcomings, e.g., non-uniqueness of solutions, modeling prior knowledge, etc. The Bayesian formalism to inverse problems avoids most of the difficulties encountered by the optimization approach, albeit at an increased computational cost. In this work, we use information theoretic arguments to cast the Bayesian inference problem in terms of an optimization problem. The resulting scheme combines the theoretical soundness of fully Bayesian inference with the computational efficiency of a simple optimization.


Variational Bayes for Merging Noisy Databases

arXiv.org Machine Learning

Bayesian entity resolution merges together multiple, noisy databases and returns the minimal collection of unique individuals represented, together with their true, latent record values. Bayesian methods allow flexible generative models that share power across databases as well as principled quantification of uncertainty for queries of the final, resolved database. However, existing Bayesian methods for entity resolution use Markov monte Carlo method (MCMC) approximations and are too slow to run on modern databases containing millions or billions of records. Instead, we propose applying variational approximations to allow scalable Bayesian inference in these models. We derive a coordinate-ascent approximation for mean-field variational Bayes, qualitatively compare our algorithm to existing methods, note unique challenges for inference that arise from the expected distribution of cluster sizes in entity resolution, and discuss directions for future work in this domain.


Thompson sampling with the online bootstrap

arXiv.org Machine Learning

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstrap Thompson sampling (BTS), a heuristic method for solving bandit problems which modifies Thompson sampling by replacing the posterior distribution used in Thompson sampling by a bootstrap distribution. We first explain BTS and show that the performance of BTS is competitive to Thompson sampling in the well-studied Bernoulli bandit case. Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution. BTS is an appealing modification of Thompson sampling, especially when samples from the posterior are otherwise not available or are costly.


Scene Image is Non-Mutually Exclusive - A Fuzzy Qualitative Scene Understanding

arXiv.org Artificial Intelligence

One of the biggest challenges in real world decision making process is to cope with uncertainty, complexity, volatility and ambiguity. How do we deal with this growing confusion in our world? In scene understanding, an important and yet difficult image understanding problem due to their variability, ambiguity, wide range of illumination and scale conditions falls into this category. The conventional goal of the works is to assign an unknown scene image to one of the several possible classes. For example, Figure 1(a) is a Coast class scene while Figure 1(c) is a Mountain class scene. Intentionally, most state-of-the-art approaches in scene understanding domain [1]-[4] are exemplar-based and assume that scene images are mutually exclusive, P (A B) 0. This simplifies the complex problem of scene understanding (uncertainty, complexity, volatility, and ambiguity) to a simple binary classification task. Such approaches learn patterns from a training set and subsequently, search for the images similar to it. As a result of this, classification errors often occur when the scene classes overlap in the selected feature space.


PAC-Bayesian AUC classification and scoring

arXiv.org Machine Learning

We develop a scoring and classification procedure based on the PAC-Bayesian approach and the AUC (Area Under Curve) criterion. We focus initially on the class of linear score functions. We derive PAC-Bayesian non-asymptotic bounds for two types of prior for the score parameters: a Gaussian prior, and a spike-and-slab prior; the latter makes it possible to perform feature selection. One important advantage of our approach is that it is amenable to powerful Bayesian computational tools. We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. We also extend our method to a class of non-linear score functions, essentially leading to a nonparametric procedure, by considering a Gaussian process prior.


Distributed Estimation, Information Loss and Exponential Families

arXiv.org Machine Learning

Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE given the whole dataset. We study this framework's statistical properties, showing that the efficiency loss compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the "full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of both methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions.