Asymptotic Bayesian Generalization Error in a General Stochastic Matrix Factorization for Markov Chain and Bayesian Network

arXiv.org Machine Learning

Stochastic matrix factorization (SMF) can be regarded as a restriction of non-negative matrix factorization (NMF). SMF is useful for inference of topic models, NMF for binary matrices data, Markov chains, and Bayesian networks. However, SMF needs strong assumptions to reach a unique factorization and its theoretical prediction accuracy has not yet been clarified. In this paper, we study the maximum the pole of zeta function (real log canonical threshold) of a general SMF and derive an upper bound of the generalization error in Bayesian inference. The results give a foundation for a widely applicable and rigorous factorization method of SMF and mean that the generalization error in SMF becomes smaller than regular statistical models by Bayesian inference.


Exhaustive search for sparse variable selection in linear regression

arXiv.org Machine Learning

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.


The Effect of Singularities in a Learning Machine when the True Parameters Do Not Lie on such Singularities

Neural Information Processing Systems

A lot of learning machines with hidden variables used in information sciencehave singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, resulting that the learning theory of regular statistical models does not hold. Recently, it was proven that, if the true parameter is contained in singularities, then the coefficient of the Bayes generalization erroris equal to the pole of the zeta function of the Kullback information.


Hybrid Model-Based Diagnosis of Web Service Compositions

AAAI Conferences

Fault diagnosis of web services composition at run time is appealing in creating a consolidated distributed application. For this purpose, we propose a hybrid model-based diagnosis method which exploits service process description or historical execution information to enhance service composition model, and localize faults by comparing the exceptional execution and the correct execution with the maximum likelihood. Experiments are conducted to evaluate the effectiveness of our method in web service composition fault diagnosis.


Bayesian Self-Organization

Neural Information Processing Systems

Smirnakis Lyman Laboratory of Physics Harvard University Cambridge, MA 02138 Lei Xu * Dept. of Computer Science HSH ENG BLDG, Room 1006 The Chinese University of Hong Kong Shatin, NT Hong Kong Abstract Recent work by Becker and Hinton (Becker and Hinton, 1992) shows a promising mechanism, based on maximizing mutual information assumingspatial coherence, by which a system can selforganize itself to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories ofvisual perception and to other organization principles for early vision (Atick and Redlich, 1990). Methods for implementation usingvariants of stochastic learning are described and, for the special case of linear filtering, we derive an analytic expression for the output. 1 Introduction The input intensity patterns received by the human visual system are typically complicated functions of the object surfaces and light sources in the world. It *Lei Xu was a research scholar in the Division of Applied Sciences at Harvard University while this work was performed. Thus the visual system must be able to extract information from the input intensities that is relatively independent of the actual intensity values.