heterogeneous database
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper describes a Bayesian hierarchical model model for handling mixed-type missing data (i.e., datasets that involve both continuous and discrete data) in large databases. The model relies on the use of latent Gaussian variables whose correlation is modeled using a bilinear latent factor model. Uncertainty on the number of latent factors is accounted for using an Indian Buffet process prior on the factor indicators. General comments: 1) Although the paper does not discuss the issue explicitly, their model treats the missingness mechanism (which determines the probability that a given value is missing) as ignorable. This is unlikely to be the case in most of the databases considered in the illustration, which is a well known to be a serious issue (a classic reference is Rubin 1976, but there is an extensive statistics literature on the topic over the last 40 years).
General Table Completion using a Bayesian Nonparametric Model
Isabel Valera, Zoubin Ghahramani
Even though heterogeneous databases can be found in a broad variety of applications, there exists a lack of tools for estimating missing data in such databases. In this paper, we provide an efficient and robust table completion tool, based on a Bayesian nonparametric latent feature model. In particular, we propose a general observation model for the Indian buffet process (IBP) adapted to mixed continuous (real-valued and positive real-valued) and discrete (categorical, ordinal and count) observations. Then, we propose an inference algorithm that scales linearly with the number of observations. Finally, our experiments over five real databases show that the proposed approach provides more robust and accurate estimates than the standard IBP and the Bayesian probabilistic matrix factorization with Gaussian observations.
General Table Completion using a Bayesian Nonparametric Model
Even though heterogeneous databases can be found in a broad variety of applications, there exists a lack of tools for estimating missing data in such databases. In this paper, we provide an efficient and robust table completion tool, based on a Bayesian nonparametric latent feature model. In particular, we propose a general observation model for the Indian buffet process (IBP) adapted to mixed continuous (real-valued and positive real-valued) and discrete (categorical, ordinal and count) observations. Then, we propose an inference algorithm that scales linearly with the number of observations. Finally, our experiments over five real databases show that the proposed approach provides more robust and accurate estimates than the standard IBP and the Bayesian probabilistic matrix factorization with Gaussian observations.
General Table Completion using a Bayesian Nonparametric Model
Valera, Isabel, Ghahramani, Zoubin
Even though heterogeneous databases can be found in a broad variety of applications, there exists a lack of tools for estimating missing data in such databases. In this paper, we provide an efficient and robust table completion tool, based on a Bayesian nonparametric latent feature model. In particular, we propose a general observation model for the Indian buffet process (IBP) adapted to mixed continuous (real-valued and positive real-valued) and discrete (categorical, ordinal and count) observations. Then, we propose an inference algorithm that scales linearly with the number of observations. Finally, our experiments over five real databases show that the proposed approach provides more robust and accurate estimates than the standard IBP and the Bayesian probabilistic matrix factorization with Gaussian observations.