Uncertainty
A Matrix Splitting Perspective on Planning with Options
Bacon, Pierre-Luc, Precup, Doina
We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations. Based on standard comparison theorems for matrix splittings, we then show how the asymptotic rate of convergence varies as a function of the inherent timescales of the options. This new perspective highlights a trade-off between asymptotic performance and the cost of computation associated with building a good set of options.
Work on leveraging optimization with mixed individual and social learning appears on Applied Soft Computing
We present CGO-AS, a generalized Ant System (AS) implemented in the framework of Cooperative Group Optimization (CGO), to show the leveraged optimization with a mixed individual and social learning. Ant colony is a simple yet efficient natural system for understanding the effects of primary intelligence on optimization. However, existing AS algorithms are mostly focusing on their capability of using social heuristic cues while ignoring their individual learning. CGO can integrate the advantages of a cooperative group and a low-level algorithm portfolio design, and the agents of CGO can explore both individual and social search. In CGO-AS, each ant (agent) is added with an individual memory, and is implemented with a novel search strategy to use individual and social cues in a controlled proportion.
An Efficient Minibatch Acceptance Test for Metropolis-Hastings
Seita, Daniel, Pan, Xinlei, Chen, Haoyu, Canny, John
We present a novel Metropolis-Hastings method for large datasets that uses small expected-size minibatches of data. Previous work on reducing the cost of Metropolis-Hastings tests yield variable data consumed per sample, with only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusting either proposal step size or temperature. Our test uses the noise-tolerant Barker acceptance test with a novel additive correction variable. The resulting test has similar cost to a normal SGD update. Our experiments demonstrate several order-of-magnitude speedups over previous work.
A case study of Empirical Bayes in User-Movie Recommendation system
Dey, Arabin Kumar, Somani, Raghav, Acharyya, Sreangsu
In this article we provide a formulation of empirical bayes described by Atchade (2011) to tune the hyperparameters of priors used in bayesian set up of collaborative filter. We implement the same in MovieLens small dataset. We see that it can be used to get a good initial choice for the parameters. It can also be used to guess an initial choice for hyper-parameters in grid search procedure even for the datasets where MCMC oscillates around the true value or takes long time to converge.
Bayesian Models of Data Streams with Hierarchical Power Priors
Masegosa, Andres, Nielsen, Thomas D., Langseth, Helge, Ramos-Lopez, Dario, Salmeron, Antonio, Madsen, Anders L.
Making inferences from data streams is a pervasive problem in many modern data analysis applications. But it requires to address the problem of continuous model updating, and adapt to changes or drifts in the underlying data generating distribution. In this paper, we approach these problems from a Bayesian perspective covering general conjugate exponential models. Our proposal makes use of non-conjugate hierarchical priors to explicitly model temporal changes of the model parameters. We also derive a novel variational inference scheme which overcomes the use of non-conjugate priors while maintaining the computational efficiency of variational methods over conjugate models. The approach is validated on three real data sets over three latent variable models.
Exhaustive search for sparse variable selection in linear regression
Igarashi, Yasuhiko, Takenaka, Hikaru, Nakanishi-Ohno, Yoshinori, Uemura, Makoto, Ikeda, Shiro, Okada, Masato
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
How Can Finance Catch Up With Other Intelligent Real-Time Systems?
When it comes to bringing intelligence to real-time engineering systems, the world of finance has been hindered by its legacy. Compared to things like self-driving cars, incumbent financial infrastructure takes a very long time to update, and is siloed into systems that cannot really talk to each other. Paul Bilokon, founder of Thalesians, an organisation to promote deeper thinking and philosophy within finance, points out that many non-financial systems are using software techniques that are far ahead. But he also sees this changing thanks to improved infrastructure tools and advancements in machine learning within finance. Paul will be speaking about new infrastructure and showing off some machine learning libraries at the forthcoming IBT data science and capital markets event.
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Extracting meaningful knowledge from large and nonlinearly-connected data structures is of primary importance for efficiently utilizing data. Big data problems (e.g. 1 GB/s) often contain superpositions of multiple distinct processes, sources, or latent factors. Estimating or inferring the component distributions or statistical factors is called the mixture problem. Methods for solving mixture problems are known as mixture models [Everitt, 1996], and in machine learning they are used to define Bayes classifiers [Bishop, 2006]. Mixture models are a widely applicable pattern recognition and dimensionality reduction approach for extracting meaningful content from large and complex datasets. Only finite mixture models are described here, although countably or uncountably infinite numbers of mixture components are also possible [McAuliffe et al., 2006]. In terms of dimensionality reduction methods, Laplacian mixture models provide global and nonhierarchical analyses of massive datasets using scalable algorithms.
How Bayesian Inference Works
Brandon is an author and deep learning developer. He has worked as Principal Data Scientist at Microsoft, as well as for DuPont Pioneer and Sandia National Laboratories. Brandon earned a Ph.D. in Mechanical Engineering from the Massachusetts Institute of Technology. Bayesian inference is a way to get sharper predictions from your data. It's particularly useful when you don't have as much data as you would like and want to juice every last bit of predictive strength from it. Although it is sometimes described with reverence, Bayesian inference isn't magic or mystical. And even though the math under the hood can get dense, the concepts behind it are completely accessible. In brief, Bayesian inference lets you draw stronger conclusions from your data by folding in what you already know about the answer. Bayesian inference is based on the ideas of Thomas Bayes, a nonconformist Presbyterian minister in London about 300 years ago. He wrote two books, one on theology, and one on probability.