Goto

Collaborating Authors

 Uncertainty


Upper Bound of Bayesian Generalization Error in Non-negative Matrix Factorization

arXiv.org Machine Learning

Recently, nonnegative matrix factorization (NMF) [1, 2] has been applied to text mining [3], signal processing [4, 5, 6], bioinformatics [7], and consumer analysis [8]. Experiments has shown that a new knowledge discovery method is derived by NMF, however, its mathematical property as a learning machine is not yet clarified, since it is not a regular statistical model. A statistical model is called regular if a function from a parameter to a probability density function is one-to-one and if the likelihood function can be approximated by a Gaussian function. It is proved that, if a statistical model is regular and if a true distribution is realizable by a statistical model, then the generalization error is asymptotically equal to d/(2n), where d, n, and the generalization error are the dimension of the parameter, the sample size, and the expected Kullback-Leibler divergence of the true distribution and the estimated learning machine, respectively. However, the statistical model used in NMF is not regular because the map from a parameter to a probability density function is not injective.


Bayesian estimation from few samples: community detection and related problems

arXiv.org Machine Learning

We propose an efficient meta-algorithm for Bayesian estimation problems that is based on low-degree polynomials, semidefinite programming, and tensor decomposition. The algorithm is inspired by recent lower bound constructions for sum-of-squares and related to the method of moments. Our focus is on sample complexity bounds that are as tight as possible (up to additive lower-order terms) and often achieve statistical thresholds or conjectured computational thresholds. Our algorithm recovers the best known bounds for community detection in the sparse stochastic block model, a widely-studied class of estimation problems for community detection in graphs. We obtain the first recovery guarantees for the mixed-membership stochastic block model (Airoldi et el.) in constant average degree graphs---up to what we conjecture to be the computational threshold for this model. We show that our algorithm exhibits a sharp computational threshold for the stochastic block model with multiple communities beyond the Kesten--Stigum bound---giving evidence that this task may require exponential time. The basic strategy of our algorithm is strikingly simple: we compute the best-possible low-degree approximation for the moments of the posterior distribution of the parameters and use a robust tensor decomposition algorithm to recover the parameters from these approximate posterior moments.


Bayesian Learning for Statistical Classification โ€“ Stats and Bots

#artificialintelligence

A well-calibrated estimator for the conditional probabilities should obey this equation. Once we have derived a statistical classifier, we need to validate it on some test data. This data should be different from that used to train the classifier, otherwise skill scores will be unduly optimistic. This is known as cross-validation. The confusion matrix expresses everything about the accuracy of a discrete classifier over a given database and you can use it to compose any possible skill score. Here, we are going to cover two that are rarely seen in the literature, but are nonetheless important for reasons that will become clear.


Computer Assisted Composition with Recurrent Neural Networks

arXiv.org Artificial Intelligence

Sequence modeling with neural networks has lead to powerful models of symbolic music data. We address the problem of exploiting these models to reach creative musical goals, by combining with human input. To this end we generalise previous work, which sampled Markovian sequence models under the constraint that the sequence belong to the language of a given finite state machine provided by the human. We consider more expressive non-Markov models, thereby requiring approximate sampling which we provide in the form of an efficient sequential Monte Carlo method. In addition we provide and compare with a beam search strategy for conditional probability maximisation. Our algorithms are capable of convincingly re-harmonising famous musical works. To demonstrate this we provide visualisations, quantitative experiments, a human listening test and audio examples. We find both the sampling and optimisation procedures to be effective, yet complementary in character. For the case of highly permissive constraint sets, we find that sampling is to be preferred due to the overly regular nature of the optimisation based results. The generality of our algorithms permits countless other creative applications.


The Mathematics of Machine Learning

#artificialintelligence

Finally, the main aim of this blog post is to give a well-intentioned advice about the importance of Mathematics in Machine Learning and the necessary topics and useful resources for a mastery of these topics. However, some Machine Learning enthusiasts are novice in Maths and will probably find this post disheartening (seriously, this is not my aim). For beginners, you don't need a lot of Mathematics to start doing Machine Learning. The fundamental prerequisite is data analysis as described in this blog post and you can learn the maths on the go as you master more techniques and algorithms. This entry was originally published on my LinkedIn page.


User behavior analytics: separating hype from reality

#artificialintelligence

I've been involved in the data analytics and high-tech industries long enough to have seen plenty of new technologies subjected to a degree of hype so great they could never ever measure up. Some of these (fuzzy logic or Google Glass, anyone?) flamed out quickly; others, like artificial intelligence (AI), have had seesawing fortunes spanning decades -- here subject to the loftiest expectations only to be followed there by a'trough of disillusionment' (one of Gartner's hype-cycle stages, and a term I like) as physical, technical and other limitations became evident. Within the sub-domain of AI for security, a collection of technologies known as user behavior analytics (UBA) is now enjoying its own moment of high expectations, much as security information and event management (SIEM) systems did about a decade ago. UBA differs from SIEM in not just aggregating and correlating alerts from different network events but by using a combination of AI and analytical approaches -- including rules-based, pattern-matching and statistical methods, plus supervised and unsupervised machine learning -- to establish baselines of how systems, networks and devices typically behave, and then to detect significant anomalies in their behavior and send alerts to security teams for further investigation. Gartner industry analysts in particular have spent lots of time thinking about UBA.


The detour problem in a stochastic environment: Tolman revisited

arXiv.org Machine Learning

We designed a grid world task to study human planning and re-planning behavior in an unknown stochastic environment. In our grid world, participants were asked to travel from a random starting point to a random goal position while maximizing their reward. Because they were not familiar with the environment, they needed to learn its characteristics from experience to plan optimally. Later in the task, we randomly blocked the optimal path to investigate whether and how people adjust their original plans to find a detour. To this end, we developed and compared 12 different models. These models were different on how they learned and represented the environment and how they planned to catch the goal. The majority of our participants were able to plan optimally. We also showed that people were capable of revising their plans when an unexpected event occurred. The result from the model comparison showed that the model-based reinforcement learning approach provided the best account for the data and outperformed heuristics in explaining the behavioral data in the re-planning trials.


Unsupervised Generative Modeling Using Matrix Product States

arXiv.org Machine Learning

Generative modeling, a typical unsupervised learning that makes use of huge amount of unlabeled data, lies in the heart of rapid development of modern machine learning techniques [1]. Different from discriminative tasks such as pattern recognition, the goal of generative modeling is to model the probability distribution of input data and thus be able to generate new samples according to the distribution. At the research frontier of generative modeling, it was used for finding good data representation and dealing with tasks with missing data. Popular generative machine learning models include the Boltzmann Machines (BM) [2, 3] and their generalizations [4], variational autoencoders (VAE) [5], autoregressive models [6, 7], nonlinear density estimations [8-10], and the generative adversarial networks (GAN) [11]. For generative model design, one tries to balance the representational power and efficiency of learning and sampling. There is a long history of relation between generative modeling and physics, especially statistical physics. Some celebrated models, such as Hopfield model [12], and Boltzmann machine [2, 3], are closely related to the Ising model in statistical physics, and its inverse version which learns couplings in the Ising model based on given training configurations [13, 14]. The task of generative modeling also shares many similarities with quantum physics research in the sense that both of them try to model probability distributions in an enormously large space. In the past decades, tensor network (TN) states and algorithms have been shown to be an incredibly potent tool set for studying many-body quantum physics with its power in expressing quantum states relevant to realistic situations [15, 16].


Multi-way Interacting Regression via Factorization Machines

arXiv.org Machine Learning

We propose a Bayesian regression method that accounts for multi-way interactions of arbitrary orders among the predictor variables. Our model makes use of a factorization mechanism for representing the regression coefficients of interactions among the predictors, while the interaction selection is guided by a prior distribution on random hypergraphs, a construction which generalizes the Finite Feature Model. We present a posterior inference algorithm based on Gibbs sampling, and establish posterior consistency of our regression model. Our method is evaluated with extensive experiments on simulated data and demonstrated to be able to identify meaningful interactions in applications in genetics and retail demand forecasting.


The Consciousness Prior

arXiv.org Machine Learning

A new prior is proposed for representation learning, which can be combined with other priors in order to help disentangling abstract factors from each other. It is inspired by the phenomenon of consciousness seen as the formation of a low-dimensional combination of a few concepts constituting a conscious thought, i.e., consciousness as awareness at a particular time instant. This provides a powerful constraint on the representation in that such low-dimensional thought vectors can correspond to statements about reality which are true, highly probable, or very useful for taking decisions. The fact that a few elements of the current state can be combined into such a predictive or useful statement is a strong constraint and deviates considerably from the maximum likelihood approaches to modelling data and how states unfold in the future based on an agent's actions. Instead of making predictions in the sensory (e.g. pixel) space, the consciousness prior allows the agent to make predictions in the abstract space, with only a few dimensions of that space being involved in each of these predictions. The consciousness prior also makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in the form of facts and rules, although the conscious states may be richer than what can be expressed easily in the form of a sentence, a fact or a rule.