Learning Graphical Models
Echo-State Conditional Restricted Boltzmann Machines
Chatzis, Sotirios (Cyprus University of Technology)
Restricted Boltzmann machines (RBMs) are a powerful generative modeling technique, based on a complex graphical model of hidden (latent) variables. Conditional RBMs (CRBMs) are an extension of RBMs tailored to modeling temporal data. A drawback of CRBMs is their consideration of linear temporal dependencies, which limits their capability to capture complex temporal structure. They also require many variables to model long temporal dependencies, a fact that might provoke overfitting proneness. To resolve these issues, in this paper we propose the echo-state CRBM (ES-CRBM): our model uses an echo-state network reservoir in the context of CRBMs to efficiently capture long and complex temporal dynamics, with much fewer trainable parameters compared to conventional CRBMs. In addition, we introduce an (implicit) mixture of ES-CRBM experts (im-ES-CRBM) to enhance even further the capabilities of our ES-CRBM model. The introduced im-ES-CRBM allows for better modeling temporal observations which might comprise a number of latent or observable subpatterns that alternate in a dynamic fashion. It also allows for performing sequence segmentation using our framework. We apply our methods to sequential data modeling and classification experiments using public datasets. As we show, our approach outperforms both existing RBM-based approaches as well as related state-of-the-art methods, such as conditional random fields.
Dynamic Bayesian Probabilistic Matrix Factorization
Chatzis, Sotirios (Cyprus University of Technology)
Collaborative filtering algorithms generally rely on the assumption that user preference patterns remain stationary. However, real-world relational data are seldom stationary. User preference patterns may change over time, giving rise to the requirement of designing collaborative filtering systems capable of detecting and adapting to preference pattern shifts. Motivated by this observation, in this paper we propose a dynamic Bayesian probabilistic matrix factorization model, designed for modeling time-varying distributions. Formulation of our model is based on imposition of a dynamic hierarchical Dirichlet process (dHDP) prior over the space of probabilistic matrix factorization models to capture the time-evolving statistical properties of modeled sequential relational datasets. We develop a simple Markov Chain Monte Carlo sampler to perform inference. We present experimental results to demonstrate the superiority of our temporal model.
Multilabel Classification with Label Correlations and Missing Labels
Bi, Wei (Hong Kong University of Science and Technology) | Kwok, James T (Hong Kong University of Science and Technology)
Many real-world applications involve multilabel classification, in which the labels can have strong inter-dependencies and some of them may even be missing.Existing multilabel algorithms are unable to handle both issues simultaneously.In this paper, we propose a probabilistic model that can automatically learn and exploit multilabel correlations.By integrating out the missing information, it also provides a disciplinedapproach to the handling of missing labels. The inference procedure is simple, and the optimization subproblems are convex. Experiments on a number of real-world data sets with both complete and missing labelsdemonstrate that the proposed algorithm can consistently outperform state-of-the-art multilabel classification algorithms.
Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation
Xia, Rui (Nanjing University of Science and Technology) | Yu, Jianfei (Nanjing University of Science and Technology) | Xu, Feng (Nanjing University of Science and Technology) | Wang, Shumei (Nanjing University of Science and Technology)
In the field of NLP, most of the existing domain adaptation studies belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. In this work, we propose a new instance-based adaptation model, called in-target-domain logistic approximation (ILA). In ILA, we adapt the source-domain data to the target domain by a logistic approximation. The normalized in-target-domain probability is assigned as an instance weight to each of the source-domain training data. An instance-weighted classification model is trained finally for the cross-domain classification problem. Compared to the previous techniques, ILA conducts instance adaptation in a dimensionality-reduced linear feature space to ensure efficiency in high-dimensional NLP tasks. The instance weights in ILA are learnt by leveraging the criteria of both maximum likelihood and minimum statistical distance. The empirical results on two NLP tasks including text categorization and sentiment classification show that our ILA model beats the state-of-the-art instance adaptation methods significantly, in cross-domain classification accuracy, parameter stability and computational efficiency.
Learning Scripts as Hidden Markov Models
Orr, John Walker (Oregon State Univserity) | Tadepalli, Prasad (Oregon State Univserity) | Doppa, Janardhan Rao (Oregon State University) | Fern, Xiaoli (Oregon State University) | Dietterich, Thomas G. (Oregon State Univserity)
Scripts have been proposed to model the stereotypical event sequences found in narratives. They can be applied to make a variety of inferences including fillinggaps in the narratives and resolving ambiguous references. This paper proposes the first formal frameworkfor scripts based on Hidden Markov Models (HMMs). Our framework supports robust inference and learning algorithms, which are lacking in previous clustering models. We develop an algorithm for structure andparameter learning based on Expectation Maximizationand evaluate it on a number of natural datasets. The results show that our algorithm is superior to several informed baselines for predicting missing events in partialobservation sequences.
Multiagent Metareasoning through Organizational Design
Sleight, Jason (University of Michigan) | Durfee, Edmund H. (University of Michigan)
We formulate an approach to multiagent metareasoning that uses organizational design to focus each agent's reasoning on the aspects of its local problem that let it make the most worthwhile contributions to joint behavior. By employing the decentralized Markov decision process framework, we characterize an organizational design problem that explicitly considers the quantitative impact that a design has on both the quality of the agents' behaviors and their reasoning costs. We describe an automated organizational design process that can approximately solve our organizational design problem via incremental search, and present techniques that efficiently estimate the incremental impact of a candidate organizational influence. Our empirical evaluation confirms that our process generates organizational designs that impart a desired metareasoning regime upon the agents.
Robust Distance Metric Learning in the Presence of Label Noise
Wang, Dong (Nanjing University of Aeronautics and Astronautics) | Tan, Xiaoyang (Nanjing University of Aeronautics and Astronautics)
Many distance learning algorithms have been developed in recent years. However, few of them consider the problem when the class labels of training data are noisy, and this may lead to serious performance deterioration. In this paper, we present a robust distance learning method in the presence of label noise, by extending a previous non-parametric discriminative distance learning algorithm, i.e., Neighbourhood Components Analysis (NCA). Particularly, we analyze the effect of label noise on the derivative of likelihood with respect to the transformation matrix, and propose to model the conditional probability of the true label of each point so as to reduce that effect. The model is then optimized within the EM framework, with additional regularization used to avoid overfitting. Our experiments on several UCI datasets and a real dataset with unknown noise patterns show that the proposed RNCA is more tolerant to class label noise compared to the original NCA method.
Agent Behavior Prediction and Its Generalization Analysis
Tian, Fei (University of Science and Technology of China) | Li, Haifang (Chinese Academy of Sciences) | Chen, Wei (Microsoft Research) | Qin, Tao (Microsoft Research) | Chen, Enhong (University of Science and Technology of China) | Liu, Tie-Yan (Microsoft Research)
Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. Behavior data in these systems are generated by live agents: once systems change due to adoption of prediction models learnt from behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. Therefore, the evolving behavior data will not be identically and independently distributed, posing great challenges to theoretical analysis. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of machine learning algorithms on its basis. We propose a novel technique that transforms the original time-variant MCRE into a higher-dimensional time-homogeneous Markov chain, which is easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we obtain a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems.
Learning Latent Engagement Patterns of Students in Online Courses
Ramesh, Arti (University Of Maryland, College Park) | Goldwasser, Dan (University of Maryland, College Park) | Huang, Bert (University of Maryland, College Park) | III, Hal Daume (University of Maryland, College Park) | Getoor, Lise (University of California, Santa Cruz)
Maintaining and cultivating student engagement is critical for learning. Understanding factors affecting student engagement will help in designing better courses and improving student retention. The large number of participants in massive open online courses (MOOCs) and data collected from their interaction with the MOOC open up avenues for studying student engagement at scale. In this work, we develop a framework for modeling and understanding student engagement in online courses based on student behavioral cues. Our first contribution is the abstraction of student engagement types using latent representations and using that in a probabilistic model to connect student behavior with course completion. We demonstrate that the latent formulation for engagement helps in predicting student survival across three MOOCs. Next, in order to initiate better instructor interventions, we need to be able to predict student survival early in the course. We demonstrate that we can predict student survival early in the course reliably using the latent model. Finally, we perform a closer quantitative analysis of user interaction with the MOOC and identify student activities that are good indicators for survival at different points in the course.
Discovering Better AAAI Keywords via Clustering with Community-Sourced Constraints
Moran, Kelly H. (Google Inc.) | Wallace, Byron C. (Brown University) | Brodley, Carla E. (Tufts University)
Selecting good conference keywords is important because they often determine the composition of review committees and hence which papers are reviewed by whom. But presently conference keywords are generated in an ad-hoc manner by a small set of conference organizers. This approach is plainly not ideal. There is no guarantee, for example, that the generated keyword set aligns with what the community is actually working on and submitting to the conference in a given year. This is especially true in fast moving fields such as AI. The problem is exacerbated by the tendency of organizers to draw heavily on preceding years' keyword lists when generating a new set. Rather than a select few ordaining a keyword set that that represents AI at large, it would be preferable to generate these keywords more directly from the data, with input from research community members. To this end, we solicited feedback from seven AAAI PC members regarding a previously existing keyword set and used these 'community-sourced constraints' to inform a clustering over the abstracts of all submissions to AAAI 2013. We show that the keywords discovered via this data-driven, human-in-the-loop method are at least as preferred (by AAAI PC members) as 2013's manually generated set, and that they include categories previously overlooked by organizers. Many of the discovered terms were used for this year's conference.