AITopics

Belief propagation (BP) is an iterative method to perform approximate inference on arbitrary graphical models. Whether BP converges and if the solution is a unique fixed point depends on both the structure and the parametrization of the model. To understand this dependence it is interesting to find \emph{all} fixed points. In this work, we formulate a set of polynomial equations, the solutions of which correspond to BP fixed points. To solve such a nonlinear system we present the numerical polynomial-homotopy-continuation (NPHC) method. Experiments on binary Ising models and on error-correcting codes show how our method is capable of obtaining all BP fixed points. On Ising models with fixed parameters we show how the structure influences both the number of fixed points and the convergence properties. We further asses the accuracy of the marginals and weighted combinations thereof. Weighting marginals with their respective partition function increases the accuracy in all experiments. Contrary to the conjecture that uniqueness of BP fixed points implies convergence, we find graphs for which BP fails to converge, even though a unique fixed point exists. Moreover, we show that this fixed point gives a good approximation, and the NPHC method is able to obtain this fixed point.

artificial intelligence, belief revision, graph, (18 more...)

1605.06451

Country: Europe (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Borboudakis, Giorgos, Tsamardinos, Ioannis

Forward-Backward Selection with Early Dropping

Forward-backward selection is one of the most basic and commonly-used feature selection algorithms available. It is also general and conceptually applicable to many different types of data. In this paper, we propose a heuristic that significantly improves its running time, while preserving predictive accuracy. The idea is to temporarily discard the variables that are conditionally independent with the outcome given the selected variable set. Depending on how those variables are reconsidered and reintroduced, this heuristic gives rise to a family of algorithms with increasingly stronger theoretical guarantees. In distributions that can be faithfully represented by Bayesian networks or maximal ancestral graphs, members of this algorithmic family are able to correctly identify the Markov blanket in the sample limit. In experiments we show that the proposed heuristic increases computational efficiency by about two orders of magnitude in high-dimensional problems, while selecting fewer variables and retaining predictive performance. Furthermore, we show that the proposed algorithm and feature selection with LASSO perform similarly when restricted to select the same number of variables, making the proposed algorithm an attractive alternative for problems where no (efficient) algorithm for LASSO exists.

algorithm, artificial intelligence, machine learning, (15 more...)

1705.1077

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.94)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Decorrelation of Neutral Vector Variables: Theory and Applications

Ma, Zhanyu, Xue, Jing-Hao, Leijon, Arne, Tan, Zheng-Hua, Yang, Zhen, Guo, Jun

In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate Gaussian distributed, the conventional principal component analysis (PCA) cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.

artificial intelligence, machine learning, vector, (16 more...)

1705.10524

Country:

Europe (0.92)
North America > United States (0.92)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Hazan, Tamir, Orabona, Francesco, Sarwate, Anand D., Maji, Subhransu, Jaakkola, Tommi

High Dimensional Inference with Random Maximum A-Posteriori Perturbations

This paper presents a new approach, called perturb-max, for high-dimensional statistical inference that is based on applying random perturbations followed by optimization. This framework injects randomness to maximum a-posteriori (MAP) predictors by randomly perturbing the potential function for the input. A classic result from extreme value statistics asserts that perturb-max operations generate unbiased samples from the Gibbs distribution using high-dimensional perturbations. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. However, when the perturbations are of low dimension, sampling the perturb-max prediction is as efficient as MAP optimization. This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution. Furthermore the expected value of the maximal perturbations is a natural bound on the entropy of such perturb-max models. A measure concentration result for perturb-max values shows that the deviation of their sampled average from its expectation decays exponentially in the number of samples, allowing effective approximation of the expectation.

artificial intelligence, machine learning, natural language, (20 more...)

1602.03571

Country:

Europe (1.00)
North America > United States > Massachusetts (0.46)
North America > United States > New Jersey (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Owhadi, Houman, Scovel, Clint

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis

We show how the discovery of robust scalable numerical solvers for arbitrary bounded linear operators can be automated as a Game Theory problem by reformulating the process of computing with partial information and limited resources as that of playing underlying hierarchies of adversarial information games. When the solution space is a Banach space $B$ endowed with a quadratic norm $\|\cdot\|$, the optimal measure (mixed strategy) for such games (e.g. the adversarial recovery of $u\in B$, given partial measurements $[\phi_i, u]$ with $\phi_i\in B^*$, using relative error in $\|\cdot\|$-norm as a loss) is a centered Gaussian field $\xi$ solely determined by the norm $\|\cdot\|$, whose conditioning (on measurements) produces optimal bets. When measurements are hierarchical, the process of conditioning this Gaussian field produces a hierarchy of elementary bets (gamblets). These gamblets generalize the notion of Wavelets and Wannier functions in the sense that they are adapted to the norm $\|\cdot\|$ and induce a multi-resolution decomposition of $B$ that is adapted to the eigensubspaces of the operator defining the norm $\|\cdot\|$. When the operator is localized, we show that the resulting gamblets are localized both in space and frequency and introduce the Fast Gamblet Transform (FGT) with rigorous accuracy and (near-linear) complexity estimates. As the FFT can be used to solve and diagonalize arbitrary PDEs with constant coefficients, the FGT can be used to decompose a wide range of continuous linear operators (including arbitrary continuous linear bijections from $H^s_0$ to $H^{-s}$ or to $L^2$) into a sequence of independent linear systems with uniformly bounded condition numbers and leads to $\mathcal{O}(N \operatorname{polylog} N)$ solvers and eigenspace adapted Multiresolution Analysis (resulting in near linear complexity approximation of all eigensubspaces).

game theory, operator, upstream oil & gas, (21 more...)

1703.10761

Country:

North America > United States > California (0.27)
North America > United States > Massachusetts (0.27)
North America > United States > Wisconsin (0.13)
(5 more...)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Energy > Oil & Gas > Upstream (0.67)
Leisure & Entertainment > Games (0.48)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(4 more...)

Auto-Encoding Sequential Monte Carlo

Le, Tuan Anh, Igl, Maximilian, Jin, Tom, Rainforth, Tom, Wood, Frank

Probabilistic machine learning [Ghahramani, 2015] allows us to model the structure and dependencies of latent variables and observational data as a joint probability distribution. Once a model is defined, we can perform inference to update our prior beliefs about latent variables in light of observed data to obtain the posterior distribution. The posterior can be used to answer any questions we might have about the latent quantities while coherently accounting for our uncertainty about the world. We introduce a method for simultaneous model learning and inference amortization [Gershman and Goodman, 2014], given an unlabeled dataset of observations. The model is specified partially, the rest being specified using a generative network whose weights are to be learned. Inference amortization refers to spending additional time before inference to obtain an amortization artifact which is used to speed up inference during test time.

artificial intelligence, machine learning, smc, (19 more...)

1705.10306

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Qin, Chao, Klabjan, Diego, Russo, Daniel

Improving the Expected Improvement Algorithm

The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of Bayesian optimization where the domain consists of a finite grid of points. This is the so-called best-arm identification problem, where the goal is to allocate measurement effort wisely to confidently identify the best arm using a small number of measurements. In this framework, one can show formally that EI is far from optimal. To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm. Surprisingly, this simple change results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.

algorithm, artificial intelligence, machine learning, (16 more...)

1705.10033

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Cortés, Efrén Cruz, Scott, Clayton

Consistent Kernel Density Estimation with Non-Vanishing Bandwidth

Consistency of the kernel density estimator requires that the kernel bandwidth tends to zero as the sample size grows. In this paper we investigate the question of whether consistency is possible when the bandwidth is fixed, if we consider a more general class of weighted KDEs. To answer this question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE), obtained by solving a quadratic program, and prove that it consistently estimates any continuous square-integrable density. We also establish rates of convergence for the fbKDE with radial kernels and the box kernel under appropriate smoothness assumptions. Furthermore, in an experimental study we demonstrate that the fbKDE compares favorably to the standard KDE and the previously proposed variable bandwidth KDE.

artificial intelligence, kernel, machine learning, (15 more...)

1705.08921

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.41)

Differentially Private Bayesian Learning on Distributed Data

Heikkilä, Mikko, Lagerspetz, Eemil, Kaski, Samuel, Shimizu, Kana, Tarkoma, Sasu, Honkela, Antti

Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results. The standard DP algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a distributed setting, where each party only holds a single sample or a few samples of the data. We propose a learning strategy based on a secure multi-party sum function for aggregating summaries from data holders and the Gaussian mechanism for DP. Our method builds on an asymptotically optimal and practically efficient DP Bayesian inference with rapidly diminishing extra cost.

artificial intelligence, machine learning, mechanism, (16 more...)

1703.01106

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Robust Stochastic Configuration Networks with Kernel Density Estimation

Wang, Dianhui, Li, Ming

Neural networks have been widely used as predictive models to fit data distribution, and they could be implemented through learning a collection of samples. In many applications, however, the given dataset may contain noisy samples or outliers which may result in a poor learner model in terms of generalization. This paper contributes to a development of robust stochastic configuration networks (RSCNs) for resolving uncertain data regression problems. RSCNs are built on original stochastic configuration networks with weighted least squares method for evaluating the output weights, and the input weights and biases are incrementally and randomly generated by satisfying with a set of inequality constrains. The kernel density estimation (KDE) method is employed to set the penalty weights for each training samples, so that some negative impacts, caused by noisy data or outliers, on the resulting learner model can be reduced. The alternating optimization technique is applied for updating a RSCN model with improved penalty weights computed from the kernel density estimation function. Performance evaluation is carried out by a function approximation, four benchmark datasets and a case study on engineering application. Comparisons to other robust randomised neural modelling techniques, including the probabilistic robust learning algorithm for neural networks with random weights and improved RVFL networks, indicate that the proposed RSCNs with KDE perform favourably and demonstrate good potential for real-world applications.

algorithm, artificial intelligence, machine learning, (17 more...)

1702.04459

Genre: Research Report (0.82)

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)