Jia, Zhichun (Dalian Maritime University) | Chen, Rong (Dalian Maritime University)

Fault diagnosis of web services composition at run time is appealing in creating a consolidated distributed application. For this purpose, we propose a hybrid model-based diagnosis method which exploits service process description or historical execution information to enhance service composition model, and localize faults by comparing the exceptional execution and the correct execution with the maximum likelihood. Experiments are conducted to evaluate the effectiveness of our method in web service composition fault diagnosis.

Igarashi, Yasuhiko, Takenaka, Hikaru, Nakanishi-Ohno, Yoshinori, Uemura, Makoto, Ikeda, Shiro, Okada, Masato

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

Kabashima, Yoshiyuki, Obuchi, Tomoyuki, Uemura, Makoto

Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size grows. To resolve this difficulty in the case of Bayesian linear regression, we develop a formula for evaluating the leave-one-out CV error approximately without actually performing CV. The usefulness of the developed formula is tested by statistical mechanical analysis for a synthetic model. This is confirmed by application to a real-world supernova data set as well.

Hayashi, Naoki, Watanabe, Sumio

Stochastic matrix factorization (SMF) can be regarded as a restriction of non-negative matrix factorization (NMF). SMF is useful for inference of topic models, NMF for binary matrices data, Markov chains, and Bayesian networks. However, SMF needs strong assumptions to reach a unique factorization and its theoretical prediction accuracy has not yet been clarified. In this paper, we study the maximum the pole of zeta function (real log canonical threshold) of a general SMF and derive an upper bound of the generalization error in Bayesian inference. The results give a foundation for a widely applicable and rigorous factorization method of SMF and mean that the generalization error in SMF becomes smaller than regular statistical models by Bayesian inference.

Fidaner, Isik B., Cemgil, Taylan

Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.