Goto

Collaborating Authors

 Uncertainty


Retweet Behavior Prediction Using Hierarchical Dirichlet Process

AAAI Conferences

The task of predicting retweet behavior is an important and essential step for various social network applications, such as business intelligence, popular event prediction, and so on. Due to the increasing requirements, in recent years, the task has attracted extensive attentions. In this work, we propose a novel method using non-parametric statistical models to combine structural, textual, and temporal information together to predict retweet behavior. To evaluate the proposed method, we collect a large number of microblogs and their corresponding social networks from a real microblog service. Experimental results on the constructed dataset demonstrate that the proposed method can achieve better performance than state-of-the-art methods. The relative improvement of the the proposed over the method using only textual information is more than 38.5% in terms of F1-Score.


Collaborative Topic Ranking: Leveraging Item Meta-Data for Sparsity Reduction

AAAI Conferences

Pair-wise ranking methods have been widely used in recommender systems to deal with implicit feedback. They attempt to discriminate between a handful of observed items and the large set of unobserved items. In these approaches, however, user preferences and item characteristics cannot be estimated reliably due to overfitting given highly sparse data. To alleviate this problem, in this paper, we propose a novel hierarchical Bayesian framework which incorporates ``bag-of-words'' type meta-data on items into pair-wise ranking models for one-class collaborative filtering. The main idea of our method lies in extending the pair-wise ranking with a probabilistic topic modeling. Instead of regularizing item factors through a zero-mean Gaussian prior, our method introduces item-specific topic proportions as priors for item factors. As a by-product, interpretable latent factors for users and items may help explain recommendations in some applications. We conduct an experimental study on a real and publicly available dataset, and the results show that our algorithm is effective in providing accurate recommendation and interpreting user factors and item factors.


A Probabilistic Model for Bursty Topic Discovery in Microblogs

AAAI Conferences

Bursty topics discovery in microblogs is important for people to grasp essential and valuable information. However, the task is challenging since microblog posts are particularly short and noisy. This work develops a novel probabilistic model, namely Bursty Biterm Topic Model (BBTM), to deal with the task. BBTM extends the Biterm Topic Model (BTM) by incorporating the burstiness of biterms as prior knowledge for bursty topic modeling, which enjoys the following merits: 1) It can well solve the data sparsity problem in topic modeling over short texts as the same as BTM; 2) It can automatical discover high quality bursty topics in microblogs in a principled and efficient way. Extensive experiments on a standard Twitter dataset show that our approach outperforms the state-of-the-art baselines significantly.


Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence

AAAI Conferences

Commenting is a popular facility provided by news sites. Analyzing such user-generated content has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons: (1) There are more than 20 official languages but linguistic resources are available mainly for Hindi. It is observed that people frequently use romanized text as it is easy and quick using an English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases, comments are made on a specific part of the article rather than the topic of the entire article. Off-the-shelf methods such as correspondence LDA are insufficient to model such relationships between articles and comments. In this paper, we extend the notion of correspondence to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over the state-of-the-art.


Effectively Predicting Whether and When a Topic Will Become Prevalent in a Social Network

AAAI Conferences

Effective forecasting of future prevalent topics plays animportant role in social network business development.It involves two challenging aspects: predicting whethera topic will become prevalent, and when. This cannotbe directly handled by the existing algorithms in topicmodeling, item recommendation and action forecasting.The classic forecasting framework based on time seriesmodels may be able to predict a hot topic when a seriesof periodical changes to user-addressed frequency in asystematic way. However, the frequency of topics discussedby users often changes irregularly in social networks.In this paper, a generic probabilistic frameworkis proposed for hot topic prediction, and machine learningmethods are explored to predict hot topic patterns.Two effective models, PreWHether and PreWHen, areintroduced to predict whether and when a topic will becomeprevalent. In the PreWHether model, we simulatethe constructed features of previously observed frequencychanges for better prediction. In the PreWHen model,distributions of time intervals associated with the emergenceto prevalence of a topic are modeled. Extensiveexperiments on real datasets demonstrate that ourmethod outperforms the baselines and generates moreeffective predictions.


Kernel Density Estimation for Text-Based Geolocation

AAAI Conferences

Text-based geolocation classifiers often operate with a grid-based view of the world. Predicting document location of origin based on text content on a geodesic grid is computationally attractive since many standard methods for supervised document classification carry over unchanged to geolocation in the form of predicting a most probable grid cell for a document. However, the grid-based approach suffers from sparse data problems if one wants to improve classification accuracy by moving to smaller cell sizes. In this paper we investigate an enhancement of common methods for determining the geographic point of origin of a text document by kernel density estimation. For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content.


Hamiltonian ABC

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) is a powerful and elegant framework for performing inference in simulation-based models. However, due to the difficulty in scaling likelihood estimates, ABC remains useful for relatively low-dimensional problems. We introduce Hamiltonian ABC (HABC), a set of likelihood-free algorithms that apply recent advances in scaling Bayesian learning using Hamiltonian Monte Carlo (HMC) and stochastic gradients. We find that a small number forward simulations can effectively approximate the ABC gradient, allowing Hamiltonian dynamics to efficiently traverse parameter spaces. We also describe a new simple yet general approach of incorporating random seeds into the state of the Markov chain, further reducing the random walk behavior of HABC. We demonstrate HABC on several typical ABC problems, and show that HABC samples comparably to regular Bayesian inference using true gradients on a high-dimensional problem from machine learning.


A Bayesian Model of node interaction in networks

arXiv.org Machine Learning

We are concerned with modeling the strength of links in networks by taking into account how often those links are used. Link usage is a strong indicator of how closely two nodes are related, but existing network models in Bayesian Statistics and Machine Learning are able to predict only wether a link exists at all. As priors for latent attributes of network nodes we explore the Chinese Restaurant Process (CRP) and a multivariate Gaussian with fixed dimensionality. The model is applied to a social network dataset and a word coocurrence dataset.


On the Bayes-optimality of F-measure maximizers

arXiv.org Machine Learning

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.


Local Expectation Gradients for Doubly Stochastic Variational Inference

arXiv.org Machine Learning

We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients through sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller sub-tasks so that each sub-task exploits intelligently the information coming from the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that mostly correlates with the variational parameter of interest resulting in a Rao-Blackwellized estimate that has low variance and can work efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, it can be trivially parallelized.