Goto

Collaborating Authors

 Learning Graphical Models


An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization

AAAI Conferences

Twitter, as a popular microblogging service, has become a new information channel for users to receive and exchange the mostup-to-date information on current events. However, since there is no control on how users can publish messages on Twitter, finding newsworthy events from Twitter becomes a difficult task like "finding a needle in a haystack". In this paper we propose a general unsupervised framework to explore events from tweets, which consists of a pipeline process of filtering, extraction and categorization. To filter out noisy tweets, the filtering step exploits a lexicon-based approach to separate tweets that are event-related from those that are not. Then, based on these event-related tweets, the structured representations of events are extracted and categorized automatically using an unsupervised Bayesian model without the use of any labelled data. Moreover, the categorized events are assigned with the event type labels without human intervention. The proposed framework has been evaluated on over 60 millions tweets which were collected for one month in December 2010. A precision of 70.49% is achieved in event extraction, outperforming a competitive baseline by nearly 6%. Events are also clustered into coherence groups with the automatically assigned event type label.


Extracting Verb Expressions Implying Negative Opinions

AAAI Conferences

Identifying aspect-based opinions has been studied extensively in recent years. However, existing work primarily focused on adjective, adverb, and noun expressions. Clearly, verb expressions can imply opinions too. We found that in many domains verb expressions can be even more important to applications because they often describe major issues of products or services. These issues enable brands and businesses to directly improve their products or services. To the best of our knowledge, this problem has not received much attention in the literature. In this paper, we make an attempt to solve this problem. Our proposed method first extracts verb expressions from reviews and then employs Markov Networks to model rich linguistic features and long distance relationships to identify negative issue expressions. Since our training data is obtained from titles of reviews whose labels are automatically inferred from review ratings, our approach is applicable to any domain without manual involvement. Experimental results using real-life review datasets show that our approach outperforms strong baselines.


Microblog Sentiment Classification with Contextual Knowledge Regularization

AAAI Conferences

Microblog sentiment classification is an important research topic which has wide applications in both academia and industry. Because microblog messages are short, noisy and contain masses of acronyms and informal words, microblog sentiment classification is a very challenging task. Fortunately, collectively the contextual information about these idiosyncratic words provide knowledge about their sentiment orientations. In this paper, we propose to use the microblogs' contextual knowledge mined from a large amount of unlabeled data to help improve microblog sentiment classification. We define two kinds of contextual knowledge: word-word association and word-sentiment association. The contextual knowledge is formulated as regularization terms in supervised learning algorithms. An efficient optimization procedure is proposed to learn the model. Experimental results on benchmark datasets show that our method can consistently and significantly outperform the state-of-the-art methods.


Online Bayesian Models for Personal Analytics in Social Media

AAAI Conferences

Latent author attribute prediction in social media provides a novel set of conditions for the construction of supervised classification models. With individual authors as training and test instances, their associated content ("features") are made available incrementally over time, as they converse over discussion forums. We propose various approaches to handling this dynamic data, from traditional batch training and testing, to incremental bootstrapping, and then active learning via crowdsourcing. Our underlying model relies on an intuitive application of Bayes rule, which should be easy to adopt by the community, thus allowing for a general shift towards online modeling for social media.


Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

AAAI Conferences

Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.


Dataless Text Classification with Descriptive LDA

AAAI Conferences

Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.


Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser

AAAI Conferences

Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.


Generalization Analysis for Game-Theoretic Machine Learning

AAAI Conferences

For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e.g., auction) since self-interested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms. To tackle this problem, a framework called game-theoretic machine learning (GTML) was recently proposed, which first learns a Markov behavior model to characterize agents' behaviors, and then learns the optimal mechanism by simulating agents' behavior changes in response to the mechanism. While GTML has demonstrated practical success, its generalization analysis is challenging because the behavior data are non-i.i.d. and dependent on the mechanism. To address this challenge, first, we decompose the generalization error for GTML into the behavior learning error and the mechanism learning error; second, for the behavior learning error, we obtain novel non-asymptotic error bounds for both parametric and non-parametric behavior learning methods; third, for the mechanism learning error, we derive a uniform convergence bound based on a new concept called \emph{nested covering number} of the mechanism space and the generalization analysis techniques developed for mixing sequences.


Scalable Planning and Learning for Multiagent POMDPs

AAAI Conferences

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.


Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer's Disease

AAAI Conferences

In the analysis and diagnosis of many diseases, such as the Alzheimer's disease (AD), two important and related tasks are usually required: i) selecting genetic and phenotypical markers for diagnosis, and ii) identifying associations between genetic and phenotypical features. While previous studies treat these two tasks separately, they are tightly coupled due to the same underlying biological basis. To harness their potential benefits for each other, we propose a new sparse Bayesian approach to jointly carry out the two important and related tasks. In our approach, we extract common latent features from different data sources by sparse projection matrices and then use the latent features to predict disease severity levels; in return, the disease status can guide the learning of sparse projection matrices, which not only reveal interactions between data sources but also select groups of related biomarkers. In order to boost the learning of sparse projection matrices, we further incorporate graph Laplacian priors encoding the valuable linkage disequilibrium (LD) information. To efficiently estimate the model, we develop a variational inference algorithm. Analysis on an imaging genetics dataset for AD study shows that our model discovers biologically meaningful associations between single nucleotide polymorphisms (SNPs) and magnetic resonance imaging (MRI) features, and achieves significantly higher accuracy for predicting ordinal AD stages than competitive methods.