Goto

Collaborating Authors

 Ikeda, Shiro


Exhaustive search for sparse variable selection in linear regression

arXiv.org Machine Learning

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.


Information-Geometrical Significance of Sparsity in Gallager Codes

Neural Information Processing Systems

We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry, and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1.


Information Geometrical Framework for Analyzing Belief Propagation Decoder

Neural Information Processing Systems

The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.


Information-Geometrical Significance of Sparsity in Gallager Codes

Neural Information Processing Systems

We report a result of perturbation analysis on decoding error of the belief propagation decoder for Gallager codes. The analysis is based on information geometry,and it shows that the principal term of decoding error at equilibrium comes from the m-embedding curvature of the log-linear submanifold spanned by the estimated pseudoposteriors, one for the full marginal, and K for partial posteriors, each of which takes a single check into account, where K is the number of checks in the Gallager code. It is then shown that the principal error term vanishes when the parity-check matrix of the code is so sparse that there are no two columns with overlap greater than 1.


Information Geometrical Framework for Analyzing Belief Propagation Decoder

Neural Information Processing Systems

The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium arenot clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.


Convergence of the Wake-Sleep Algorithm

Neural Information Processing Systems

The WS (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. Buteven for a factor analysis model, the general convergence is not proved theoretically.


Convergence of the Wake-Sleep Algorithm

Neural Information Processing Systems

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.