Learning Graphical Models
Learning without recall in directed circles and rooted trees
Rahimian, M. Amin, Jadbabaie, Ali
This work investigates the case of a network of agents that attempt to learn some unknown state of the world amongst the finitely many possibilities. At each time step, agents all receive random, independently distributed private signals whose distributions are dependent on the unknown state of the world. However, it may be the case that some or any of the agents cannot distinguish between two or more of the possible states based only on their private observations, as when several states result in the same distribution of the private signals. In our model, the agents form some initial belief (probability distribution) about the unknown state and then refine their beliefs in accordance with their private observations, as well as the beliefs of their neighbors. An agent learns the unknown state when her belief converges to a point mass that is concentrated at the true state. A rational agent would use the Bayes' rule to incorporate her neighbors' beliefs and own private signals over time. While such repeated applications of the Bayes' rule in networks can become computationally intractable, in this paper, we show that in the canonical cases of directed star, circle or path networks and their combinations, one can derive a class of memoryless update rules that replicate that of a single Bayesian agent but replace the self beliefs with the beliefs of the neighbors. This way, one can realize an exponentially fast rate of learning similar to the case of Bayesian (fully rational) agents. The proposed rules are a special case of the Learning without Recall.
Machine Learning on Human Connectome Data from MRI
Brown, Colin J, Hamarneh, Ghassan
Functional MRI (fMRI) and diffusion MRI (dMRI) are non-invasive imaging modalities that allow in-vivo analysis of a patient's brain network (known as a connectome). Use of these technologies has enabled faster and better diagnoses and treatments of neurological disorders and a deeper understanding of the human brain. Recently, researchers have been exploring the application of machine learning models to connectome data in order to predict clinical outcomes and analyze the importance of subnetworks in the brain. Connectome data has unique properties, which present both special challenges and opportunities when used for machine learning. The purpose of this work is to review the literature on the topic of applying machine learning models to MRI-based connectome data. This field is growing rapidly and now encompasses a large body of research. To summarize the research done to date, we provide a comparative, structured summary of 77 relevant works, tabulated according to different criteria, that represent the majority of the literature on this topic. (We also published a living version of this table online at http://connectomelearning.cs.sfu.ca that the community can continue to contribute to.) After giving an overview of how connectomes are constructed from dMRI and fMRI data, we discuss the variety of machine learning tasks that have been explored with connectome data. We then compare the advantages and drawbacks of different machine learning approaches that have been employed, discussing different feature selection and feature extraction schemes, as well as the learning models and regularization penalties themselves. Throughout this discussion, we focus particularly on how the methods are adapted to the unique nature of graphical connectome data. Finally, we conclude by summarizing the current state of the art and by outlining what we believe are strategic directions for future research.
These Are The Most Elegant, Useful Algorithms In Machine Learning
Developed back in the 50s by Rosenblatt and colleagues, this extremely simple algorithm can be viewed as the foundation for some of the most successful classifiers today, including suport vector machines and logistic regression, solved using stochastic gradient descent. The convergence proof for the Perceptron algorithm is one of the most elegant pieces of math I've seen in ML. Most useful: Boosting, especially boosted decision trees. This intuitive approach allows you to build highly accurate ML models, by combining many simple ones. Boosting is one of the most practical methods in ML, it's widely used in industry, can handle a wide variety of data types, and can be implemented at scale.
A Benchmark and Comparison of Active Learning for Logistic Regression
Various active learning methods based on logistic regression have been proposed. In this paper, we investigate seven state-of-the-art strategies, present an extensive benchmark, and provide a better understanding of their underlying characteristics. Experiments are carried out both on 3 synthetic datasets and 43 real-world datasets, providing insights into the behaviour of these active learning methods with respect to classification accuracy and their computational cost.
An Overview on Data Representation Learning: From Traditional Feature Learning to Recent Deep Learning
Zhong, Guoqiang, Wang, Li-Na, Dong, Junyu
Since about 100 years ago, to learn the intrinsic structure of data, many representation learning approaches have been proposed, including both linear ones and nonlinear ones, supervised ones and unsupervised ones. Particularly, deep architectures are widely applied for representation learning in recent years, and have delivered top results in many tasks, such as image classification, object detection and speech recognition. In this paper, we review the development of data representation learning methods. Specifically, we investigate both traditional feature learning algorithms and state-of-the-art deep learning models. The history of data representation learning is introduced, while available resources (e.g. online course, tutorial and book information) and toolboxes are provided. Finally, we conclude this paper with remarks and some interesting research directions on data representation learning.
Quantum Enhanced Inference in Markov Logic Networks
Wittek, Peter, Gogolin, Christian
Markov logic networks (MLNs) reconcile two opposing schools in machine learning and artificial intelligence: causal networks, which account for uncertainty extremely well, and first-order logic, which allows for formal deduction. An MLN is essentially a first-order logic template to generate Markov networks. Inference in MLNs is probabilistic and it is often performed by approximate methods such as Markov chain Monte Carlo (MCMC) Gibbs sampling. An MLN has many regular, symmetric structures that can be exploited at both first-order level and in the generated Markov network. We analyze the graph structures that are produced by various lifting methods and investigate the extent to which quantum protocols can be used to speed up Gibbs sampling with state preparation and measurement schemes. We review different such approaches, discuss their advantages, theoretical limitations, and their appeal to implementations. We find that a straightforward application of a recent result yields exponential speedup compared to classical heuristics in approximate probabilistic inference, thereby demonstrating another example where advanced quantum resources can potentially prove useful in machine learning.
Machine Learning Basics with Naive Bayes
After researching and looking into the different algorithms associated with Machine Learning, I've found that there is an abundance of great material showing you how to use certain algorithms in a specific language. However what's usually missing is the simple mathematical explaination of how the algorithm works. In all cases this may not be possible without a strong mathematical background, but for some I know I would definitely find it useful. This post requires just basic mathematics knowledge and an interst in data science and machine learning. I will be talking about Naive Bayes as a classifier and explaining in simple terms how it works and when you might use it.
Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo
Fearnhead, Paul, Bierkens, Joris, Pollock, Murray, Roberts, Gareth O
Monte Carlo methods, such as MCMC and SMC, have been central to the application of Bayesian statistics to real-world problems (Robert and Casella, 2011; McGrayne, 2011). These established Monte Carlo methods are based upon simulating discrete-time Markov processes. For example MCMC algorithms simulate a discrete-time Markov chain constructed to have a target distribution of interest, the posterior distribution in Bayesian inference, as its stationary distribution. Whilst SMC methods involve propagating and re-weighting particles so that a final set of weighted particles approximate a target distribution. The propagation step here also involves simulating from a discrete-time Markov chain. 1 In the past few years there have been exciting developments in MCMC and SMC methods based on continuoustime versions of these Monte Carlo methods. For example, continuous-time MCMC algorithms have been proposed (Peters and de With, 2012; Bouchard-Côté et al., 2015; Bierkens and Roberts, 2015; Bierkens et al., 2016) that involve simulating a continuous-time Markov process that has been designed to have a target distribution of interest as its stationary distribution. These continuous-time MCMC algorithms were originally motivated as they are examples of nonreversible Markov processes. There is substantial evidence that nonreversible MCMC algorithms will be more efficient than standard MCMC algorithms that are reversible (Neal, 1998; Diaconis et al., 2000; Neal, 2004; Bierkens, 2015), and there is empirical evidence that these continuous-time MCMC algorithms are more efficient than their discrete-time counterparts (see e.g.
Infinite Variational Autoencoder for Semi-Supervised Learning
Abbasnejad, Ehsan, Dick, Anthony, Hengel, Anton van den
This paper presents an infinite variational autoencoder (VAE) whose capacity adapts to suit the input data. This is achieved using a mixture model where the mixing coefficients are modeled by a Dirichlet process, allowing us to integrate over the coefficients when performing inference. Critically, this then allows us to automatically vary the number of autoencoders in the mixture based on the data. Experiments show the flexibility of our method, particularly for semi-supervised learning, where only a small number of training samples are available.
Parsimonious modeling with Information Filtering Networks
Barfuss, Wolfram, Massara, Guido Previde, Di Matteo, T., Aste, Tomaso
We introduce a methodology to construct parsimonious probabilistic models. This method makes use of Information Filtering Networks to produce a robust estimate of the global sparse inverse covariance from a simple sum of local inverse covariances computed on small sub-parts of the network. Being based on local and low-dimensional inversions, this method is computationally very efficient and statistically robust even for the estimation of inverse covariance of high-dimensional, noisy and short time-series. Applied to financial data our method results computationally more efficient than state-of-the-art methodologies such as Glasso producing, in a fraction of the computation time, models that can have equivalent or better performances but with a sparser inference structure. We also discuss performances with sparse factor models where we notice that relative performances decrease with the number of factors. The local nature of this approach allows us to perform computations in parallel and provides a tool for dynamical adaptation by partial updating when the properties of some variables change without the need of recomputing the whole model. This makes this approach particularly suitable to handle big datasets with large numbers of variables. Examples of practical application for forecasting, stress testing and risk allocation in financial systems are also provided.