Goto

Collaborating Authors

 Genre


Training Restricted Boltzmann Machines on Word Observations

arXiv.org Machine Learning

The restricted Boltzmann machine (RBM) is a flexible tool for modeling complex data, however there have been significant computational difficulties in using RBMs to model high-dimensional multinomial observations. In natural language processing applications, words are naturally modeled by K-ary discrete distributions, where K is determined by the vocabulary size and can easily be in the hundreds of thousands. The conventional approach to training RBMs on word observations is limited because it requires sampling the states of K-way softmax visible units during block Gibbs updates, an operation that takes time linear in K. In this work, we address this issue by employing a more general class of Markov chain Monte Carlo operators on the visible units, yielding updates with computational complexity independent of K. We demonstrate the success of our approach by training RBMs on hundreds of millions of word n-grams using larger vocabularies than previously feasible and using the learned features to improve performance on chunking and sentiment classification tasks, achieving state-of-the-art results on the latter.


Super-Mixed Multiple Attribute Group Decision Making Method Based on Hybrid Fuzzy Grey Relation Approach Degree

arXiv.org Artificial Intelligence

A multiple attribute decision making (MADM), in which attributes are real number, interval real number, linguistic and uncertain linguistic value, has been already applied in practice such as the evaluation of enterprise effect, the selection of investment project, the selection of person, the research of military equipment scheme, the evaluation of strategy effect, the reliability assessment and the maintainability assessment, etc (Yongqi Xia, 2004, Dang Luo, Sifeng Liu, 2005, Yongqing Wei, Peide Liu, 2009). Extended TOPSIS Method with Interval-Valued Intuitionistic Fuzzy Numbers for Virtual Enterprise Partner Selection has been researched by Fei Ye(2010). Chuanming Ding (2007,a) defined a new similarity degree for various types of attribute and normalized the calculation of similarity degree of the attribute value of each type in unified metric space. Also, by this similarity degree, the comparison of each plan with ideal plan was performed and decision making method was given. Chuanming (2007,b), based on the TOPSIS (Technique for Order Preference by Similarity to Ideal Solution), transformed the attribute value of plan into four-dimensional attribute value, unified various types of attribute value, defined a fourdimensional approach degree, and by this approach degree, solved the multiple attribute mixed-type decision-making problem associated with real number, interval real number, linguistic and uncertain linguistic value. Yongqi Xia (2004) studied a method considering insufficiency degree of information and preference to danger on the basis of the grey-fuzzy comprehensive evaluation method of interval value preference. In the method, they represent the weight and the attribute value by two interval number pair by considering membership and grey degree at the same time. Sifeng Liu, Yaoguo Dang, Jiangling Wang, Zhengpeng Wu (2009), based on the definitions of entropy, proposed a method of getting weight that considers the character of grey cluster decision-making and 2-tuple linguistic assessment, and proposed the method of 2-tuple linguistic assessment based on grey cluster. Zhen Zhang, Chonghui Guo (2012) transformed uncertain linguistic evaluation information of each decision maker to trapezoidal fuzzy numbers, and then denoted, by solving two optimization models, the collective evaluation of the alternatives by trapezoidal fuzzy numbers.


Higher-Order Partial Least Squares (HOPLS): A Generalized Multi-Linear Regression Method

arXiv.org Artificial Intelligence

A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced with the aim to predict a tensor (multiway array) $\tensor{Y}$ from a tensor $\tensor{X}$ through projecting the data onto the latent space and performing regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains the data by a sum of orthogonal Tucker tensors, while the number of orthogonal loadings serves as a parameter to control model complexity and prevent overfitting. The low dimensional latent space is optimized sequentially via a deflation operation, yielding the best joint subspace approximation for both $\tensor{X}$ and $\tensor{Y}$. Instead of decomposing $\tensor{X}$ and $\tensor{Y}$ individually, higher order singular value decomposition on a newly defined generalized cross-covariance tensor is employed to optimize the orthogonal loadings. A systematic comparison on both synthetic data and real-world decoding of 3D movement trajectories from electrocorticogram (ECoG) signals demonstrate the advantages of HOPLS over the existing methods in terms of better predictive ability, suitability to handle small sample sizes, and robustness to noise.


On-the-fly Macros

arXiv.org Artificial Intelligence

Macros have long been studied in AI planning [9, 18]. Many domain-dependent applications of macros have been exhibited and studied [15, 17, 12]; also, a number of domain-independent methods for learning, inferring, filtering, and applying macros have been the topic of research continuing up to the present [2, 7, 20]. In this paper, we present a domain-independent algorithm that computes macros in a novel way. Our algorithm computes macros "on-the-fly" for a given set of states and does not require previously learned or inferred information, nor does it need any prior domain knowledge. We exhibit the power of our algorithm by using it to define new domain-independent tractable classes of classical planning that strictly extend previously defined such classes [6], and can be proved to include Blocksworld-arm 1 and Towers of Hanoi. We believe that this is notable as theoretically defined, domainindependent tractable classes have generally struggled to incorporate construction-type domains such as these two. We hence give theoretically grounded evidence of the computational value of macros in planning.


The DLR Hierarchy of Approximate Inference

arXiv.org Machine Learning

We propose a hierarchy for approximate inference based on the Dobrushin, Lanford, Ruelle (DLR) equations. This hierarchy includes existing algorithms, such as belief propagation, and also motivates novel algorithms such as factorized neighbors (FN) algorithms and variants of mean field (MF) algorithms. In particular, we show that extrema of the Bethe free energy correspond to approximate solutions of the DLR equations. In addition, we demonstrate a close connection between these approximate algorithms and Gibbs sampling. Finally, we compare and contrast various of the algorithms in the DLR hierarchy on spin-glass problems. The experiments show that algorithms higher up in the hierarchy give more accurate results when they converge but tend to be less stable.


Discovery of non-gaussian linear causal models using ICA

arXiv.org Machine Learning

In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis (ICA), and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data.


Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks

arXiv.org Machine Learning

One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BN-learning problem is NP-hard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hill-climbing with tabu lists; moreover, many of the proposed algorithms are quite complex and hard to implement. In this paper, we propose a very simple and easy-to-implement method for addressing this task. Our approach is based on the well-known fact that the best network (of bounded in-degree) consistent with a given node ordering can be found very efficiently. We therefore propose a search not over the space of structures, but over the space of orderings, selecting for each ordering the best network consistent with it. This search space is much smaller, makes more global search steps, has a lower branching factor, and avoids costly acyclicity checks. We present results for this algorithm on both synthetic and real data sets, evaluating both the score of the network found and in the running time. We show that ordering-based search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement.


Mining Associated Text and Images with Dual-Wing Harmoniums

arXiv.org Machine Learning

We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dual-wing harmonium for captioned images, incorporating a multivariate Poisson for word-counts and a multivariate Gaussian for color histogram. We present empirical results on the applications of this model to classification, retrieval and image annotation on news video collections, and we report an extensive comparison with various extant models.


A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

arXiv.org Machine Learning

We consider the estimation of the policy gradient in partially observable Markov decision processes (POMDP) with a special class of structured policies that are finite-state controllers. We show that the gradient estimation can be done in the Actor-Critic framework, by making the critic compute a "value" function that does not depend on the states of POMDP. This function is the conditional mean of the true value function that depends on the states. We show that the critic can be implemented using temporal difference (TD) methods with linear function approximations, and the analytical results on TD and Actor-Critic can be transfered to this case. Although Actor-Critic algorithms have been used extensively in Markov decision processes (MDP), up to now they have not been proposed for POMDP as an alternative to the earlier proposal GPOMDP algorithm, an actor-only method. Furthermore, we show that the same idea applies to semi-Markov problems with a subset of finite-state controllers.


Two-Way Latent Grouping Model for User Preference Prediction

arXiv.org Machine Learning

We introduce a novel latent grouping model for predicting the relevance of a new document to a user. The model assumes a latent group structure for both users and documents. We compared the model against a state-of-the-art method, the User Rating Profile model, where only users have a latent group structure. We estimate both models by Gibbs sampling. The new method predicts relevance more accurately for new documents that have few known ratings. The reason is that generalization over documents then becomes necessary and hence the twoway grouping is profitable.