AITopics

1711.06598

Country:

Europe (1.00)
Asia (0.67)
North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Liang, Dawen, Krishnan, Rahul G., Hoffman, Matthew D., Jebara, Tony

Variational Autoencoders for Collaborative Filtering

arXiv.org Machine LearningFeb-15-2018

We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1802.05814

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Media (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Lerasle, Matthieu, Szabo, Zoltan, Lecue, Guillaume, Massiot, Gaspar, Moulines, Eric

MONK -- Outlier-Robust Mean Embedding Estimation by Median-of-Means

arXiv.org Machine LearningFeb-15-2018

Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design minimax-optimal estimators for kernel mean embedding and MMD, with finite-sample strong outlier-robustness guarantees.

artificial intelligence, estimator, machine learning, (15 more...)

1802.04784

Country: Europe (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Marchetti, Sabina, Antonucci, Alessandro

Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks

arXiv.org Artificial IntelligenceFeb-15-2018

A reliable modeling of uncertain evidence in Bayesian networks based on a set-valued quantification is proposed. Both soft and virtual evidences are considered. We show that evidence propagation in this setup can be reduced to standard updating in an augmented credal network, equivalent to a set of consistent Bayesian networks. A characterization of the computational complexity for this task is derived together with an efficient exact procedure for a subclass of instances. In the case of multiple uncertain evidences over the same variable, the proposed procedure can provide a set-valued version of the geometric approach to opinion pooling.

artificial intelligence, bayesian network, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1802.05639

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Vertex nomination: The canonical sampling and the extended spectral nomination schemes

Yoder, Jordan, Chen, Li, Pao, Henry, Bridgeford, Eric, Levin, Keith, Fishkind, Donniell, Priebe, Carey, Lyzinski, Vince

Suppose that one particular block in a stochastic block model is deemed "interesting," but block labels are only observed for a few of the vertices. Utilizing a graph realized from the model, the vertex nomination task is to order the vertices with unobserved block labels into a "nomination list" with the goal of having an abundance of interesting vertices near the top of the list. In this paper we extend and enhance two basic vertex nomination schemes; the canonical nomination scheme ${\mathcal L}^C$ and the spectral partitioning nomination scheme ${\mathcal L}^P$. The canonical nomination scheme ${\mathcal L}^C$ is provably optimal, but is computationally intractable, being impractical to implement even on modestly sized graphs. With this in mind, we introduce a scalable, Markov chain Monte Carlo-based nomination scheme, called the {\it canonical sampling nomination scheme} ${\mathcal L}^{CS}$, that converges to the canonical nomination scheme ${\mathcal L}^{C}$ as the amount of sampling goes to infinity. We also introduce a novel spectral partitioning nomination scheme called the {\it extended spectral partitioning nomination scheme} ${\mathcal L}^{EP}$. Real-data and simulation experiments are employed to illustrate the effectiveness of these vertex nomination schemes, as well as their empirical computational complexity.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1802.0496

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Superposition-Assisted Stochastic Optimization for Hawkes Processes

Xu, Hongteng, Chen, Xu, Carin, Lawrence

We consider the learning of multi-agent Hawkes processes, a model containing multiple Hawkes processes with shared endogenous impact functions and different exogenous intensities. In the framework of stochastic maximum likelihood estimation, we explore the associated risk bound. Further, we consider the superposition of Hawkes processes within the model, and demonstrate that under certain conditions such an operation is beneficial for tightening the risk bound. Accordingly, we propose a stochastic optimization algorithm assisted with a diversity-driven superposition strategy, achieving better learning results with improved convergence properties. The effectiveness of the proposed method is verified on synthetic data, and its potential to solve the cold-start problem of sequential recommendation systems is demonstrated on real-world data.

artificial intelligence, hawke process, machine learning, (15 more...)

1802.04725

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Inferring Tweedie Compound Poisson Mixed Models with Adversarial Variational Bayes

Yang, Yaodong, Demyanov, Sergey, Liu, Yunayuan, Wang, Jun

The Tweedie Compound Poisson-Gamma model is routinely used for modelling non-negative continuous data with a discrete probability mass at zero. Mixed models with random effects account for the covariance structure related to the grouping hierarchy in the data. An important application of Tweedie mixed models is estimating the aggregated loss for insurance policies. However, the intractable likelihood function, the unknown variance function, and the hierarchical structure of mixed effects have presented considerable challenges for drawing inferences on Tweedie. In this study, we tackle the Bayesian Tweedie mixed-effects models via variational approaches. In particular, we empower the posterior approximation by implicit models trained in an adversarial setting. To reduce the variance of gradients, we reparameterize random effects, and integrate out one local latent variable of Tweedie. We also employ a flexible hyper prior to ensure the richness of the approximation. Our method is evaluated on both simulated and real-world data. Results show that the proposed method has smaller estimation bias on the random effects compared to traditional inference methods including MCMC; it also achieves a state-of-the-art predictive performance, meanwhile offering a richer estimation of the variance function.

artificial intelligence, machine learning, random effect, (15 more...)

1706.05446

Genre: Research Report > New Finding (0.86)

Industry: Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Modeling & Simulation (0.93)
(2 more...)

Kim, Hyunjik, Teh, Yee Whye

Scaling up the Automatic Statistician: Scalable Structure Discovery using Gaussian Processes

Automating statistical modelling is a challenging problem in artificial intelligence. The Automatic Statistician takes a first step in this direction, by employing a kernel search algorithm with Gaussian Processes (GP) to provide interpretable statistical models for regression problems. However this does not scale due to its $O(N^3)$ running time for the model selection. We propose Scalable Kernel Composition (SKC), a scalable kernel search algorithm that extends the Automatic Statistician to bigger data sets. In doing so, we derive a cheap upper bound on the GP marginal likelihood that sandwiches the marginal likelihood with the variational lower bound . We show that the upper bound is significantly tighter than the lower bound and thus useful for model selection.

artificial intelligence, bayesian inference, machine learning, (16 more...)

1706.02524

Country: Europe (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Mattingly, Henry H., Transtrum, Mark K., Abbott, Michael C., Machta, Benjamin B.

Maximizing the information learned from finite data selects a simple model

We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of the parameter manifold. Thus it selects a lower-dimensional effective theory in a principled way, ignoring irrelevant parameter directions. In the limit where there is sufficient data to tightly constrain any number of parameters, this reduces to Jeffreys prior. But we argue that this limit is pathological when applied to the hyper-ribbon parameter manifolds generic in science, because it leads to dramatic dependence on effects invisible to experiment.

artificial intelligence, information, machine learning, (20 more...)

doi: 10.1073/pnas.1715306115

1705.01166

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Zhang, Jiong, Raman, Parameswaran, Ji, Shihao, Yu, Hsiang-Fu, Vishwanathan, S. V. N., Dhillon, Inderjit S.

Extreme Stochastic Variational Inference: Distributed and Asynchronous

We propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free algorithm to perform variational inference on massive real world datasets. Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. ESVI overcomes these limitations by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. We demonstrate the effectiveness of ESVI by running Latent Dirichlet Allocation (LDA) on UMBC-3B, a dataset that has a vocabulary of 3 million and a token size of 3 billion. To best of our knowledge, this is an order of magnitude larger than the largest dataset on which results using variational inference have been reported in literature. In our experiments, we found that ESVI outperforms VI and SVI, and also achieves a better quality solution. In addition, we propose a strategy to speed up computation and save memory when fitting large number of topics.

artificial intelligence, bayesian inference, machine learning, (16 more...)

1605.09499

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)