Indian Institute of Technology Kanpur
Non-Negative Inductive Matrix Completion for Discrete Dyadic Data
Rai, Piyush (Indian Institute of Technology Kanpur)
We present a non-negative inductive latent factor model for binary- and count-valued matrices containing dyadic data, with side information along the rows and/or the columns of the matrix. The side information is incorporated by conditioning the row and column latent factors on the available side information via a regression model. Our model can not only perform matrix factorization and completion with side-information, but also infers interpretable latent topics that explain/summarize the data. An appealing aspect of our model is in the full local conjugacy of all parts of the model, including the main latent factor model, as well as for the regression model that leverages the side information. This enables us to design scalable and simple to implement Gibbs sampling and Expectation Maximization algorithms for doing inference in the model. Inference cost in our model scales in the number of nonzeros in the data matrix, which makes it particularly attractive for massive, sparse matrices. We demonstrate the effectiveness of our model on several real-world data sets, comparing it with state-of-the-art baselines.
Scalable Optimization of Multivariate Performance Measures in Multi-instance Multi-label Learning
Aggarwal, Apoorv (Indian Institute of Technology Bombay) | Ghoshal, Sandip (Indian Institute of Technology Bombay) | Shetty, Ankith M. S. (Indian Institute of Technology Bombay) | Sinha, Suhit (Indian Institute of Technology Bombay) | Ramakrishnan, Ganesh (Indian Institute of Technology Bombay) | Kar, Purushottam (Indian Institute of Technology Kanpur) | Jain, Prateek (Microsoft Research )
The problem of multi-instance multi-label learning (MIML) requires a bag of instances to be assigned a set of labels most relevant to the bag as a whole. The problem finds numerous applications in machine learning, computer vision, and natural language processing settings where only partial or distant supervision is available. We present a novel method for optimizing multivariate performance measures in the MIML setting. Our approach MIML-perf uses a novel plug-in technique and offers a seamless way to optimize a vast variety of performance measures such as macro and micro-F measure, average precision, which are performance measures of choice in multi-label learning domains. MIML-perf offers two key benefits over the state of the art. Firstly, across a diverse range of benchmark tasks, ranging from relation extraction to text categorization and scene classification, MIML-perf offers superior performance as compared to state of the art methods designed specifically for these tasks. Secondly, MIML-perf operates with significantly reduced running times as compared to other methods, often by an order of magnitude or more.
Automatic Generation of Alternative Starting Positions for Simple Traditional Board Games
Ahmed, Umair Z. (Indian Institute of Technology Kanpur) | Chatterjee, Krishnendu (The Institute of Science and Technology) | Gulwani, Sumit (Microsoft Research, Redmond)
Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role not only in the development of mathematical and logical skills, but also in the emotional and social development. In this paper, we address the problem of generating targeted starting positions for such games. This can facilitate new approaches for bringing novice players to mastery, and also leads to discovery of interesting game variants. We present an approach that generates starting states of varying hardness levels for player 1 in a two-player board game, given rules of the board game, the desired number of steps required for player 1 to win, and the expertise levels of the two players. Our approach leverages symbolic methods and iterative simulation to efficiently search the extremely large state space. We present experimental results that include discovery of states of varying hardness levels for several simple grid-based board games. The presence of such states for standard game variants like 4 x 4 Tic-Tac-Toe opens up new games to be played that have never been played as the default start state is heavily biased.
Identifying Purchase Intent from Social Posts
Gupta, Vineet (Adobe Research India Labs, Adobe Systems) | Varshney, Devesh (Indian Institute of Technology Roorkee) | Jhamtani, Harsh (Indian Institute of Technology Roorkee) | Kedia, Deepam (Indian Institute of Technology Kanpur) | Karwa, Shweta (Indian Institute of Technology Delhi)
In present times, social forums such as Quora and Yahoo! Answers constitute powerful media through which people discuss on a variety of topics and express their intentions and thoughts. Here they often reveal their potential intent to purchase - 'Purchase Intent' (PI). A purchase intent is defined as a text expression showing a desire to purchase a product or a service in future. Extracting posts having PI from a user's social posts gives huge opportunities towards web personalization, targeted marketing and improving community observing systems. In this paper, we explore the novel problem of detecting PIs from social posts and classifying them. We find that using linguistic features along with statistical features of PI expressions achieves a significant improvement in PI classification over 'bag-of-words' based features used in many present day social-media classification tasks. Our approach takes into consideration the specifics of social posts like limited contextual information, incorrect grammar, language ambiguities, etc. by extracting features at two different levels of text granularity - word and phrase based features and grammatical dependency based features. Apart from these, the patterns observed in PI posts help us to identify some specific features.
Automatically Generating Problems and Solutions for Natural Deduction
Ahmed, Umair Z. (Indian Institute of Technology Kanpur) | Gulwani, Sumit (Microsoft Research Redmond) | Karkare, Amey (Indian Institute of Technology Kanpur)
Natural deduction, which is a method for establishing validity of propositional type arguments, helps develop important reasoning skills and is thus a key ingredient in a course on introductory logic. We present two core components, namely solution generation and practice problem generation, for enabling computer-aided education for this important subject domain. The key enabling technology is use of an offline-computed data-structure called Universal Proof Graph (UPG) that encodes all possible applications of inference rules over all small propositions abstracted using their bitvector-based truth-table representation. This allows an efficient forward search for solution generation. More interestingly, this allows generating fresh practice problems that have given solution characteristics by performing a backward search in UPG. We obtained around 300 natural deduction problems from various textbooks. Our solution generation procedure can solve many more problems than the traditional forward-chaining based procedure, while our problem generation procedure can efficiently generate several variants with desired characteristics.
Predicting the Importance of Newsfeed Posts and Social Network Friends
Paek, Tim (Microsoft Research) | Gamon, Michael (Microsoft Research) | Counts, Scott (Microsoft Research) | Chickering, David Maxwell (Microsoft Research) | Dhesi, Aman (Indian Institute of Technology Kanpur)
As users of social networking websites expand their network of friends, they are often flooded with newsfeed posts and status updates, most of which they consider to be "unimportant" and not newsworthy. In order to better understand how people judge the importance of their newsfeed, we conducted a study in which Facebook users were asked to rate the importance of their newsfeed posts as well as their friends. We learned classifiers of newsfeed and friend importance to identify predictive sets of features related to social media properties, the message text, and shared background information. For classifying friend importance, the best performing model achieved 85% accuracy and 25% error reduction. By leveraging this model for classifying newsfeed posts, the best newsfeed classifier achieved 64% accuracy and 27% error reduction.