Bayesian Inference
Information-Maximization Clustering based on Squared-Loss Mutual Information
Sugiyama, Masashi, Yamada, Makoto, Kimura, Manabu, Hachiya, Hirotaka
Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve non-convex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative information-maximization clustering method based on a squared-loss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.
Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey
Probabilistic graphical models combine the graph theory and probability theory to give a multivariate statistical modeling. They provide a unified description of uncertainty using probability and complexity using the graphical model. Especially, graphical models provide the following several useful properties: - Graphical models provide a simple and intuitive interpretation of the structures of probabilistic models. On the other hand, they can be used to design and motivate new models. - Graphical models provide additional insights into the properties of the model, including the conditional independence properties. - Complex computations which are required to perform inference and learning in sophisticated models can be expressed in terms of graphical manipulations, in which the underlying mathematical expressions are carried along implicitly. The graphical models have been applied to a large number of fields, including bioinformatics, social science, control theory, image processing, marketing analysis, among others. However, structure learning for graphical models remains an open challenge, since one must cope with a combinatorial search over the space of all possible structures. In this paper, we present a comprehensive survey of the existing structure learning algorithms.
Bayesian Causal Induction
Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.
Optimal exponential bounds on the accuracy of classification
We consider a standard binary classification problem. The performance of any binary classifier based on the training data is characterized by the excess risk. We study Bahadur's type exponential bounds on the minimax accuracy confidence function based on the excess risk. We study how this quantity depends on the complexity of the class of distributions characterized by exponents of entropies of the class of regression functions or of the class of Bayes classifiers corresponding to the distributions from the class. We also study its dependence on margin parameters of the classification problem.
On l_1 Mean and Variance Filtering
Wahlberg, Bo, Rojas, Cristian R., Annergren, Mariette
This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to estimate the change times and the mean values within the segments. The second case is when the mean value is constant, but the variance can change. The assumption is that the variance is piecewise constant in time, and we want to estimate change times and the variance values within the segments. To find solutions to these problems, we will study an l_1 regularized maximum likelihood method, related to the fused lasso method and l_1 trend filtering, where the parameters to be estimated are free to vary at each sample. To penalize variations in the estimated parameters, the l_1-norm of the time difference of the parameters is used as a regularization term. This idea is closely related to total variation denoising. The main contribution is that a convex formulation of this variance estimation problem, where the parametrization is based on the inverse of the variance, can be formulated as a certain l_1 mean estimation problem. This implies that results and methods for mean estimation can be applied to the challenging problem of variance segmentation/estimation.
Self-Avoiding Random Dynamics on Integer Complex Systems
Hamze, Firas, Wang, Ziyu, de Freitas, Nando
This paper introduces a new specialized algorithm for equilibrium Monte Carlo sampling of binary-valued systems, which allows for large moves in the state space. This is achieved by constructing self-avoiding walks (SAWs) in the state space. As a consequence, many bits are flipped in a single MCMC step. We name the algorithm SARDONICS, an acronym for Self-Avoiding Random Dynamics on Integer Complex Systems. The algorithm has several free parameters, but we show that Bayesian optimization can be used to automatically tune them. SARDONICS performs remarkably well in a broad number of sampling tasks: toroidal ferromagnetic and frustrated Ising models, 3D Ising models, restricted Boltzmann machines and chimera graphs arising in the design of quantum computers.
Receiver Architectures for MIMO-OFDM Based on a Combined VMP-SP Algorithm
Manchรณn, Carles Navarro, Kirkelund, Gunvor E., Riegler, Erwin, Christensen, Lars P. B., Fleury, Bernard H.
Iterative information processing, either based on heuristics or analytical frameworks, has been shown to be a very powerful tool for the design of efficient, yet feasible, wireless receiver architectures. Within this context, algorithms performing message-passing on a probabilistic graph, such as the sum-product (SP) and variational message passing (VMP) algorithms, have become increasingly popular. In this contribution, we apply a combined VMP-SP message-passing technique to the design of receivers for MIMO-ODFM systems. The message-passing equations of the combined scheme can be obtained from the equations of the stationary points of a constrained region-based free energy approximation. When applied to a MIMO-OFDM probabilistic model, we obtain a generic receiver architecture performing iterative channel weight and noise precision estimation, equalization and data decoding. We show that this generic scheme can be particularized to a variety of different receiver structures, ranging from high-performance iterative structures to low complexity receivers. This allows for a flexible design of the signal processing specially tailored for the requirements of each specific application. The numerical assessment of our solutions, based on Monte Carlo simulations, corroborates the high performance of the proposed algorithms and their superiority to heuristic approaches.
Optimization with Sparsity-Inducing Penalties
Bach, Francis, Jenatton, Rodolphe, Mairal, Julien, Obozinski, Guillaume
Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.
Representations and Ensemble Methods for Dynamic Relational Classification
Rossi, Ryan A., Neville, Jennifer
Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to increase the accuracy of statistical relational learning algorithms. The temporal relational representations serve as a basis for classification, ensembles, and pattern mining in evolving domains. The framework includes (1) selecting the time-varying relational components (links, attributes, nodes), (2) selecting the temporal granularity, (3) predicting the temporal influence of each time-varying relational component, and (4) choosing the weighted relational classifier. Additionally, we propose temporal ensemble methods that exploit the temporal-dimension of relational data. These ensembles outperform traditional and more sophisticated relational ensembles while avoiding the issue of learning the most optimal representation. Finally, the space of temporal-relational models are evaluated using a sample of classifiers. In all cases, the proposed temporal-relational classifiers outperform competing models that ignore the temporal information. The results demonstrate the capability and necessity of the temporal-relational representations for classification, ensembles, and for mining temporal datasets.
Making Decisions Using Sets of Probabilities: Updating, Time Consistency, and Calibration
Grunwald, P. D., Halpern, J. Y.
We consider how an agent should update her beliefs when her beliefs are represented by a set P of probability distributions, given that the agent makes decisions using the minimax criterion, perhaps the best-studied and most commonly-used criterion in the literature. We adopt a game-theoretic framework, where the agent plays against a bookie, who chooses some distribution from P. We consider two reasonable games that differ in what the bookie knows when he makes his choice. Anomalies that have been observed before, like time inconsistency, can be understood as arising because different games are being played, against bookies with different information. We characterize the important special cases in which the optimal decision rules according to the minimax criterion amount to either conditioning or simply ignoring the information. Finally, we consider the relationship between updating and calibration when uncertainty is described by sets of probabilities. Our results emphasize the key role of the rectangularity condition of Epstein and Schneider.