Uncertainty
Bayesian multi-tensor factorization
Khan, Suleiman A., Leppäaho, Eemeli, Kaski, Samuel
We introduce Bayesian multi-tensor factorization, a model that is the first Bayesian formulation for joint factorization of multiple matrices and tensors. The research problem generalizes the joint matrix-tensor factorization problem to arbitrary sets of tensors of any depth, including matrices, can be interpreted as unsupervised multi-view learning from multiple data tensors, and can be generalized to relax the usual trilinear tensor factorization assumptions. The result is a factorization of the set of tensors into factors shared by any subsets of the tensors, and factors private to individual tensors. We demonstrate the performance against existing baselines in multiple tensor factorization tasks in structural toxicogenomics and functional neuroimaging.
On-going Developments and Outlook for Deep Learning
There are huge numbers of variants of deep architectures as it's a fast developing field and so it helps to mention other leading algorithms. The list is intended to be comprehensive but not exhaustive since so many algorithms are being developed [1] [2][1],[2]. Additionally, Fuzzy logic models can also be used with other models such as decision trees, hidden Markov and Bayesian and artificial neural networks to model complicated risk issues like policyholder behaviours. A risk assessment and decision-making platform for ratemaking built on a fuzzy logic system can provide consistency when analyzing risks with limited data and knowledge. It allows people to focus on the foundation of risk assessment, which involves the cause-and-effect relationship between key factors as well as the exposure for each individual risk.
Columbia University Free Online Course on Machine Learning
Columbia University is offering free online course on Machine Learning. It is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In this course applicants will master the essentials of machine learning and algorithms to help improve learning from data without human intervention. The course will start on January 16, 2017. Columbia University is one of the world's most important centers of research and at the same time a distinctive and distinguished learning environment for undergraduates and graduate students in many scholarly and professional fields.
Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering
Lesieur, Thibault, De Bacco, Caterina, Banks, Jess, Krzakala, Florent, Moore, Cris, Zdeborová, Lenka
Abstract-- We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of m points in n dimensions, n, m and α m/n stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of α and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, r 4 2 α, there is a gap between the threshold for informationtheoretically optimal performance and the threshold at which known algorithms succeed. Clustering m points in n-dimensional space is a ubiquitous problem in statistical inference and data science.
A Bayesian Information Criterion for Singular Models
On Wednesday, Mathias Drton and I will be presenting a read paper on Bayesian model choice for singular models at the Royal Statistical Society in London. You can read more about it on the RSS web site, where you can also download a preprint. The paper is scheduled to appear, with the discussion, in Series B of the Journal of the Royal Statistical Society next year. The CRAN package sBIC by Luca Weihs implements the ideas in the paper and includes a series of vignettes that allow you to step through some of the examples in the paper.
Combining local and global smoothing in multivariate density estimation
Nonparametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage. Consider estimation of the probability density function f(·) of a continuous random variable in cases when a parametric formulation for f is not considered appropriate. Given a random sample drawn form f, a variety of nonparametric estimation methods are available.
Pseudo-Bayesian Robust PCA: Algorithms and Analyses
Oh, Tae-Hyun, Matsushita, Yasuyuki, Kweon, In So, Wipf, David
Commonly used in computer vision and other applications, robust PCA represents an algorithmic attempt to reduce the sensitivity of classical PCA to outliers. The basic idea is to learn a decomposition of some data matrix of interest into low rank and sparse components, the latter representing unwanted outliers. Although the resulting optimization problem is typically NP-hard, convex relaxations provide a computationally-expedient alternative with theoretical support. However, in practical regimes performance guarantees break down and a variety of non-convex alternatives, including Bayesian-inspired models, have been proposed to boost estimation quality. Unfortunately though, without additional a priori knowledge none of these methods can significantly expand the critical operational range such that exact principal subspace recovery is possible. Into this mix we propose a novel pseudo-Bayesian algorithm that explicitly compensates for design weaknesses in many existing non-convex approaches leading to state-of-the-art performance with a sound analytical foundation. Surprisingly, our algorithm can even outperform convex matrix completion despite the fact that the latter is provided with perfect knowledge of which entries are not corrupted.
Inductive Coherence
Garrabrant, Scott, Fallenstein, Benya, Demski, Abram, Soares, Nate
While probability theory is normally applied to external environments, there has been some recent interest in probabilistic modeling of the outputs of computations that are too expensive to run. Since mathematical logic is a powerful tool for reasoning about computer programs, we consider this problem from the perspective of integrating probability and logic. Recent work on assigning probabilities to mathematical statements has used the concept of coherent distributions, which satisfy logical constraints such as the probability of a sentence and its negation summing to one. Although there are algorithms which converge to a coherent probability distribution in the limit, this yields only weak guarantees about finite approximations of these distributions. In our setting, this is a significant limitation: Coherent distributions assign probability one to all statements provable in a specific logical theory, such as Peano Arithmetic, which can prove what the output of any terminating computation is; thus, a coherent distribution must assign probability one to the output of any terminating computation. To model uncertainty about computations, we propose to work with approximations to coherent distributions. We introduce inductive coherence, a strengthening of coherence that provides appropriate constraints on finite approximations, and propose an algorithm which satisfies this criterion.
Gamma Belief Networks
Zhou, Mingyuan, Cong, Yulai, Chen, Bo
To infer multilayer deep representations of high-dimensional discrete and nonnegative real vectors, we propose an augmentable gamma belief network (GBN) that factorizes each of its hidden layers into the product of a sparse connection weight matrix and the nonnegative real hidden units of the next layer. The GBN's hidden layers are jointly trained with an upward-downward Gibbs sampler that solves each layer with the same subroutine. The gamma-negative binomial process combined with a layer-wise training strategy allows inferring the width of each layer given a fixed budget on the width of the first layer. Example results illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the GBN can add more layers to improve its performance in both unsupervisedly extracting features and predicting heldout data. For exploratory data analysis, we extract trees and subnetworks from the learned deep network to visualize how the very specific factors discovered at the first hidden layer and the increasingly more general factors discovered at deeper hidden layers are related to each other, and we generate synthetic data by propagating random variables through the deep network from the top hidden layer back to the bottom data layer.
A primer on universal function approximation with deep learning (in Torch and R)
Arthur C. Clarke famously stated that "any sufficiently advanced technology is indistinguishable from magic." No current technology embodies this statement more than neural networks and deep learning. And like any good magic it not only dazzles and inspires but also puts fear into people's hearts. One known property of artificial neural networks (ANNs) is that they are universal function approximators. This means that any mathematical function can be represented by a neural network.