Learning Graphical Models
Efficient Planning for Factored Infinite-Horizon DEC-POMDPs
Pajarinen, Joni Kristian (Aalto University) | Peltonen, Jaakko Tapani (Aalto University)
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is restricted by costs or physical limits. DEC-POMDPs are a promising solution, but optimizing policies quickly becomes computationally intractable when problem size grows. Factored DEC-POMDPs allow large problems to be described in compact form, but have the same worst case complexity as non-factored DEC-POMDPs. We propose an efficient optimization algorithm for large factored infinite-horizon DEC-POMDPs. We formulate expectation-maximization based optimization into a new form, where complexity can be kept tractable by factored approximations. Our method performs well, and it can solve problems with more agents and larger state spaces than state of the art DEC-POMDP methods. We give results for factored infinite-horizon DEC-POMDP problems with up to 10 agents.
On Learning Discrete Graphical Models Using Greedy Methods
Jalali, Ali, Johnson, Chris, Ravikumar, Pradeep
In this paper, we address the problem of learning the structure of a pairwise graphical model from samples in a high-dimensional setting. Our first main result studies the sparsistency, or consistency in sparsity pattern recovery, properties of a forward-backward greedy algorithm as applied to general statistical models. As a special case, we then apply this algorithm to learn the structure of a discrete graphical model via neighborhood estimation. As a corollary of our general result, we derive sufficient conditions on the number of samples n, the maximum node-degree d and the problem size p, as well as other conditions on the model parameters, so that the algorithm recovers all the edges with high probability. Our result guarantees graph selection for samples scaling as n = Omega(d^2 log(p)), in contrast to existing convex-optimization based algorithms that require a sample complexity of \Omega(d^3 log(p)). Further, the greedy algorithm only requires a restricted strong convexity condition which is typically milder than irrepresentability assumptions. We corroborate these results using numerical simulations at the end.
Viral Actions: Predicting Video View Counts Using Synchronous Sharing Behaviors
Shamma, David A. (Yahoo! Research) | Yew, Jude (University of Michigan) | Kennedy, Lyndon (Yahoo! Research) | Churchill, Elizabeth F. (Yahoo! Research)
In this article, we present a method for predicting the view count of a YouTube video using a small feature set collected from a synchronous sharing tool. We hypothesize that videos which have a high YouTube view count will exhibit a unique sharing pattern when shared in synchronous environments. Using a one-day sample of 2,188 dyadic sessions from the Yahoo! Zync synchronous sharing tool, we demonstrate how to predict the video's view count on YouTube, specifically if a video has over 10 million views. The prediction model is 95.8% accurate and done with a relatively small training set; only 15% of the videos had more than one session viewing; in effect, the classifier had a precision of 76.4% and a recall of 81%. We describe a prediction model that relies on using implicit social shared viewing behavior such as how many times a video was paused, rewound, or fast-forwarded as well as the duration of the session. Finally, we present some new directions for future virality research and for the design of future social media tools.
Hierarchical Bayesian Models for Latent Attribute Detection in Social Media
Rao, Delip (Johns Hopkins University) | Paul, Michael (Johns Hopkins University) | Fink, Clay (Johns Hopkins University) | Yarowsky, David (Johns Hopkins University) | Oates, Timothy (University of Maryland Baltimore County) | Coppersmith, Glen (JHU Human Language Technology Center of Excellence)
We present several novel minimally-supervised models for detecting latent attributes of social media users, with a focus on ethnicity and gender. Previouswork on ethnicity detection has used coarse-grained widely separated classes of ethnicity and assumed the existence of large amounts of training data such as the US census, simplifying the problem. Instead, we examine content generated by users in addition to name morpho-phonemics to detect ethnicity and gender. Further, weaddress this problem in a challenging setting where the ethnicity classes are more fine grained -- ethnicity classes in Nigeria -- and with very limited training data.
Latent Set Models for Two-Mode Network Data
DuBois, Christopher (University of California, Irvine) | Foulds, James (University of California, Irvine) | Smyth, Padhraic (University of California, Irvine)
Two-mode networks are a natural representation for many kinds of relational data. These networks are bipartite graphs consisting of two distinct sets ("modes") of entities. For example, one can model multiple recipient email data as a two-mode network of (a) individuals and (b) the emails that they send or receive. In this work we present a statistical model for two-mode network data which posits that individuals belong to latent sets and that the members of a particular set tend to co-appear. We show how to infer these latent sets from observed data using a Markov chain Monte Carlo inference algorithm. We apply the model to the Enron email corpus, using it to discover interpretable latent structure as well as evaluating its predictive accuracy on a missing data task. Extensions to the model are discussed that incorporate additional side information such as the email's sender or text content, further improving the accuracy of the model.
Timing Tweets to Increase Effectiveness of Information Campaigns
Dabeer, Onkar (Tata Institute of Fundamental Research) | Mehendale, Prachi (Tata Institute of Fundamental Research) | Karnik, Aditya (India Science Lab) | Saroop, Atul (India Science Lab)
Microblogging websites such as Twitter are increasingly being used by businesses/campaigners for timely dissemination of information to their followers. The diffusion of a tweet depends on several factors: the activity of the follower nodes, the responsiveness of follower nodes to tweets from the source node, the out-degree of the follower nodes, the content of recent related tweets seen by the follower node, etc. Using such factors, in this paper, we propose a framework to measure the effectiveness of an information campaign over Twitter. We consider a positive as well as a negative metric to measure the impact of a tweet: while retweets are used to measure the positive impact, the lack of a timely response from an active follower node is taken as a potential negative impact. We investigate the scheduling of tweets to increase the net positive impact while keeping the net negative impact below a desired level. We propose and study several scheduling algorithms by casting the problem in a Markov Decision Process (MDP) framework. In order to compare our algorithms, we estimate the model parameters from tweet data collected using the Twitter API from an arbitrarily selected node and its 6837 followers over several months. For this dataset, we find that if successive tweets in the campaign are novel, then substantial gains over user activity based scheduling can be obtained by scheduling tweets in time slots where the ratio of the expected positive and negative metrics is high. We call this the MaxRatio policy and we show that it is optimal under certain conditions. In cases where we are not certain about the response of users to successive related tweets, we identify another algorithm (which we call MaxReach) as a robust alternative.
Event Summarization Using Tweets
Chakrabarti, Deepayan (Yahoo! Research) | Punera, Kunal (Yahoo! Research)
Twitter has become exceedingly popular, with hundreds of millions of tweets being posted every day on a wide variety of topics. This has helped make real-time search applications possible with leading search engines routinely displaying relevant tweets in response to user queries. Recent research has shown that a considerable fraction of these tweets are about "events," and the detection of novel events in the tweet-stream has attracted a lot of research interest. However, very little research has focused on properly displaying this real-time information about events. For instance, the leading search engines simply display all tweets matching the queries in reverse chronological order. In this paper we argue that for some highly structured and recurring events, such as sports, it is better to use more sophisticated techniques to summarize the relevant tweets. We formalize the problem of summarizing event-tweets and give a solution based on learning the underlying hidden state representation of the event via Hidden Markov Models. In addition, through extensive experiments on real-world data we show that our model significantly outperforms some intuitive and competitive baselines.
Finding Consensus Bayesian Network Structures
Suppose that multiple experts (or learning algorithms) provide us with alternative Bayesian network (BN) structures over a domain, and that we are interested in combining them into a single consensus BN structure. Specifically, we are interested in that the consensus BN structure only represents independences all the given BN structures agree upon and that it has as few parameters associated as possible. In this paper, we prove that there may exist several nonequivalent consensus BN structures and that finding one of them is NPhard. Thus, we decide to resort to heuristics to find an approximated consensus BN structure. In this paper, we consider the heuristic proposed in (Matzkevich and Abramson, 1992, 1993a,b). This heuristic builds upon two algorithms, called Methods A and B, for efficiently deriving the minimal directed independence map of a BN structure relative to a given node ordering. Methods A and B are claimed to be correct although no proof is provided (a proof is just sketched). In this paper, we show that Methods A and B are not correct and propose a correction of them.
Transfer Learning by Reusing Structured Knowledge
Yang, Qiang (Hong Kong University of Science and Technology) | Zheng, Vincent W. (Hong Kong University of Science and Technology) | Li, Bin (Institute TELECOM SudParis) | Zhuo, Hankz Hankui (Sun Yat-sen University)
Transfer learning aims to solve new learning problems by extracting and making use of the common knowledge found in related domains. A key element of transfer learning is to identify structured knowledge to enable the knowledge transfer. Structured knowledge comes in different forms, depending on the nature of the learning problem and characteristics of the domains. In this article, we describe three of our recent works on transfer learning in a progressively more sophisticated order of the structured knowledge being transferred. We show that optimization methods, and techniques inspired by the concerns of data reuse can be applied to extract and transfer deep structural knowledge between a variety of source and target problems. In our examples, this knowledge spans explicit data labels, model parameters, relations between data clusters and relational action descriptions.
AI-Based Software Defect Predictors: Applications and Benefits in a Case Study
Misirli, Ayse Tosun (Bogazici University) | Bener, Ayse (Ryerson University) | Kale, Resat (Turkcell Technology)
Software defect prediction aims to reduce software testing efforts by guiding testers through the defect-prone sections of software systems. Defect predictors are widely used in organizations to predict defects in order to save time and effort as an alternative to other techniques such as manual code reviews. The usage of a defect prediction model in a real-life setting is difficult because it requires software metrics and defect data from past projects to predict the defect-proneness of new projects. It is, on the other hand, very practical because it is easy to apply, can detect defects using less time and reduces the testing effort. We have built a learning-based defect prediction model for a telecommunication company in the space of one year. In this study, we have briefly explained our model, presented its pay-off and described how we have implemented the model in the company. Furthermore, we compared the performance of our model with that of another testing strategy applied in a pilot project that implemented a new process called Team Software Process (TSP). Our results show that defect predictors can predict 87 percent of code defects, decrease inspection efforts by 72 percent and hence, reduces post-release defects by 44 percent. Furthermore, they can be used as complementary tools for a new process implementation whose effects on testing activities are limited.