Genre
Multi-label Learning via Structured Decomposition and Group Sparsity
In multi-label learning, each sample is associated with several labels. Existing works indicate that exploring correlations between labels improve the prediction performance. However, embedding the label correlations into the training process significantly increases the problem size. Moreover, the mapping of the label structure in the feature space is not clear. In this paper, we propose a novel multi-label learning method "Structured Decomposition + Group Sparsity (SDGS)". In SDGS, we learn a feature subspace for each label from the structured decomposition of the training data, and predict the labels of a new sample from its group sparse representation on the multi-subspace obtained from the structured decomposition. In particular, in the training stage, we decompose the data matrix $X\in R^{n\times p}$ as $X=\sum_{i=1}^kL^i+S$, wherein the rows of $L^i$ associated with samples that belong to label $i$ are nonzero and consist a low-rank matrix, while the other rows are all-zeros, the residual $S$ is a sparse matrix. The row space of $L_i$ is the feature subspace corresponding to label $i$. This decomposition can be efficiently obtained via randomized optimization. In the prediction stage, we estimate the group sparse representation of a new sample on the multi-subspace via group \emph{lasso}. The nonzero representation coefficients tend to concentrate on the subspaces of labels that the sample belongs to, and thus an effective prediction can be obtained. We evaluate SDGS on several real datasets and compare it with popular methods. Results verify the effectiveness and efficiency of SDGS.
Regularization Strategies and Empirical Bayesian Learning for MKL
Multiple kernel learning (MKL), structured sparsity, and multi-task learning have recently received considerable attention. In this paper, we show how different MKL algorithms can be understood as applications of either regularization on the kernel weights or block-norm-based regularization, which is more common in structured sparsity and multi-task learning. We show that these two regularization strategies can be systematically mapped to each other through a concave conjugate operation. When the kernel-weight-based regularizer is separable into components, we can naturally consider a generative probabilistic model behind MKL. Based on this model, we propose learning algorithms for the kernel weights through the maximization of marginal likelihood. We show through numerical experiments that $\ell_2$-norm MKL and Elastic-net MKL achieve comparable accuracy to uniform kernel combination. Although uniform kernel combination might be preferable from its simplicity, $\ell_2$-norm MKL and Elastic-net MKL can learn the usefulness of the information sources represented as kernels. In particular, Elastic-net MKL achieves sparsity in the kernel weights.
Variational approximation for heteroscedastic linear models and matching pursuit algorithms
Nott, David J., Tran, Minh-Ngoc, Leng, Chenlei
Modern statistical applications involving large data sets have focused attention on statistical methodologies which are both efficient computationally and able to deal with the screening of large numbers of different candidate models. Here we consider computationally efficient variational Bayes approaches to inference in high-dimensional heteroscedastic linear regression, where both the mean and variance are described in terms of linear functions of the predictors and where the number of predictors can be larger than the sample size. We derive a closed form variational lower bound on the log marginal likelihood useful for model selection, and propose a novel fast greedy search algorithm on the model space which makes use of one step optimization updates to the variational lower bound in the current model for screening large numbers of candidate predictor variables for inclusion/exclusion in a computationally thrifty way. We show that the model search strategy we suggest is related to widely used orthogonal matching pursuit algorithms for model search but yields a framework for potentially extending these algorithms to more complex models. The methodology is applied in simulations and in two real examples involving prediction for food constituents using NIR technology and prediction of disease progression in diabetes.
Estimation of low-rank tensors via convex optimization
Tomioka, Ryota, Hayashi, Kohei, Kashima, Hisashi
In this paper, we propose three approaches for the estimation of the Tucker decomposition of multi-way arrays (tensors) from partial observations. All approaches are formulated as convex minimization problems. Therefore, the minimum is guaranteed to be unique. The proposed approaches can automatically estimate the number of factors (rank) through the optimization. Thus, there is no need to specify the rank beforehand. The key technique we employ is the trace norm regularization, which is a popular approach for the estimation of low-rank matrices. In addition, we propose a simple heuristic to improve the interpretability of the obtained factorization. The advantages and disadvantages of three proposed approaches are demonstrated through numerical experiments on both synthetic and real world datasets. We show that the proposed convex optimization based approaches are more accurate in predictive performance, faster, and more reliable in recovering a known multilinear structure than conventional approaches.
Stochastic Stepwise Ensembles for Variable Selection
The ensemble approach for statistical modelling was first made popular by such algorithms as boosting (Freund and Schapire 1996; Friedman et al. 2000), bagging (Breiman 1996), random forest (Breiman 2001), and the gradient boosting machine (Friedman 2001). They are powerful algorithms for solving prediction problems. This article is concerned with using the ensemble approach for a different problem, variable selection. We shall use the terms "prediction ensemble" and "variableselection ensemble" to differentiate ensembles used for these different purposes.
Loopy Belief Propagation, Bethe Free Energy and Graph Zeta Function
Watanabe, Yusuke, Fukumizu, Kenji
We propose a new approach to the theoretical analysis of Loopy Belief Propagation (LBP) and the Bethe free energy (BFE) by establishing a formula to connect LBP and BFE with a graph zeta function. The proposed approach is applicable to a wide class of models including multinomial and Gaussian types. The connection derives a number of new theoretical results on LBP and BFE. This paper focuses two of such topics. One is the analysis of the region where the Hessian of the Bethe free energy is positive definite, which derives the non-convexity of BFE for graphs with multiple cycles, and a condition of convexity on a restricted set. This analysis also gives a new condition for the uniqueness of the LBP fixed point. The other result is to clarify the relation between the local stability of a fixed point of LBP and local minima of the BFE, which implies, for example, that a locally stable fixed point of the Gaussian LBP is a local minimum of the Gaussian Bethe free energy.
Efficient Planning under Uncertainty with Macro-actions
He, R., Brunskill, E., Roy, N.
Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD). PBD leverages a novel method for calculating the posterior distribution over beliefs that result after a sequence of actions is taken, given the set of observation sequences that could be received during this process. This method allows us to efficiently evaluate the expected reward of a sequence of primitive actions, which we refer to as macro-actions. We present a formal analysis of our approach, and examine its performance on two very large simulation experiments: scientific exploration and a target monitoring domain. We also demonstrate our algorithm being used to control a real robotic helicopter in a target monitoring experiment, which suggests that our approach has practical potential for planning in real-world, large partially observable domains where a multi-step lookahead is required to achieve good performance.
Decision Making Agent Searching for Markov Models in Near-Deterministic World
Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent -born with a representation and a policy- might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome non-Markovity and to develop efficient behaviors, which are easy to inherit, tests the Markov property of the behavioral states, and corrects against non-Markovity by running a deterministic factored Finite State Model, which can be learned. We illustrate the properties of architecture in the near deterministic Ms. Pac-Man game. We analyze the architecture from the point of view of evolutionary, individual, and social learning.
Back and Forth Between Rules and SE-Models (Extended Version)
Rules in logic programming encode information about mutual interdependencies between literals that is not captured by any of the commonly used semantics. This information becomes essential as soon as a program needs to be modified or further manipulated. We argue that, in these cases, a program should not be viewed solely as the set of its models. Instead, it should be viewed and manipulated as the set of sets of models of each rule inside it. With this in mind, we investigate and highlight relations between the SE-model semantics and individual rules. We identify a set of representatives of rule equivalence classes induced by SE-models, and so pinpoint the exact expressivity of this semantics with respect to a single rule. We also characterise the class of sets of SE-interpretations representable by a single rule. Finally, we discuss the introduction of two notions of equivalence, both stronger than strong equivalence [1] and weaker than strong update equivalence [2], which seem more suitable whenever the dependency information found in rules is of interest.
A hybrid model for bankruptcy prediction using genetic algorithm, fuzzy c-means and mars
Martin, A., Gayathri, V., Saranya, G., Gayathri, P., Venkatesan, Prasanna
Bankruptcy prediction is very important for all the organization since it affects the economy and rise many social problems with high costs. There are large number of techniques have been developed to predict the bankruptcy, which helps the decision makers such as investors and financial analysts. One of the bankruptcy prediction models is the hybrid model using Fuzzy C-means clustering and MARS, which uses static ratios taken from the bank financial statements for prediction, which has its own theoretical advantages. The performance of existing bankruptcy model can be improved by selecting the best features dynamically depend on the nature of the firm. This dynamic selection can be accomplished by Genetic Algorithm and it improves the performance of prediction model..