Goto

Collaborating Authors

 Media


Provable Inductive Matrix Completion

arXiv.org Machine Learning

Consider a movie recommendation system where apart from the ratings information, side information such as user's age or movie's genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the problem of inductive matrix completion in the exact recovery setting. That is, we assume that the ratings matrix is generated by applying feature vectors to a low-rank matrix and the goal is to recover back the underlying matrix. Furthermore, we generalize the problem to that of low-rank matrix estimation using rank-1 measurements. We study this generic problem and provide conditions that the set of measurements should satisfy so that the alternating minimization method (which otherwise is a non-convex method with no convergence guarantees) is able to recover back the {\em exact} underlying low-rank matrix. In addition to inductive matrix completion, we show that two other low-rank estimation problems can be studied in our framework: a) general low-rank matrix sensing using rank-1 measurements, and b) multi-label regression with missing labels. For both the problems, we provide novel and interesting bounds on the number of measurements required by alternating minimization to provably converges to the {\em exact} low-rank matrix. In particular, our analysis for the general low rank matrix sensing problem significantly improves the required storage and computational cost than that required by the RIP-based matrix sensing methods \cite{RechtFP2007}. Finally, we provide empirical validation of our approach and demonstrate that alternating minimization is able to recover the true matrix for the above mentioned problems using a small number of measurements.


Meta Path-Based Collective Classification in Heterogeneous Information Networks

arXiv.org Machine Learning

Collective classification has been intensively studied due to its impact in many important applications, such as web mining, bioinformatics and citation analysis. Collective classification approaches exploit the dependencies of a group of linked objects whose class labels are correlated and need to be predicted simultaneously. In this paper, we focus on studying the collective classification problem in heterogeneous networks, which involves multiple types of data objects interconnected by multiple types of links. Intuitively, two objects are correlated if they are linked by many paths in the network. However, most existing approaches measure the dependencies among objects through directly links or indirect links without considering the different semantic meanings behind different paths. In this paper, we study the collective classification problem taht is defined among the same type of objects in heterogenous networks. Moreover, by considering different linkage paths in the network, one can capture the subtlety of different types of dependencies among objects. We introduce the concept of meta-path based dependencies among objects, where a meta path is a path consisting a certain sequence of linke types. We show that the quality of collective classification results strongly depends upon the meta paths used. To accommodate the large network size, a novel solution, called HCC (meta-path based Heterogenous Collective Classification), is developed to effectively assign labels to a group of instances that are interconnected through different meta-paths. The proposed HCC model can capture different types of dependencies among objects with respect to different meta paths. Empirical studies on real-world networks demonstrate that effectiveness of the proposed meta path-based collective classification approach.


A Gramulator Analysis of Gendered Language in Cable News Reportage

AAAI Conferences

News reportage is intended to serve the public in terms of nurturing a better understanding of political and societal concerns. But such a goal may be stymied if reporters lack a sufficient understanding of the effect gendered language may have on the conveyance and interpretation of news. To address this issue, we use the Gramulator to conduct an applied natural language processing study of the linguistic and topical features of gendered language in news reportage. Our goal is to offer some insights as to how the choice of language and topics might affect the efficacy of news reportage. Results suggest that current news reportage largely conforms to an established gender divide: Specifically, we find evidence that male reportage is more quantitative and likely to focus on topics such as politics, crime, and the military. By contrast, female reportage is more qualitative, and likely to focus on issues such as home and education. The study is of interest to all current affairs writers (e.g., journalists) because it offers a systematic approach to identifying and assessing the linguistic and topical differences that contribute to gendered language in new reportage.


Benefits of Semantics on Web Service Composition from a Complex Network Perspective

arXiv.org Artificial Intelligence

The number of publicly available Web services (WS) is continuously growing, and in parallel, we are witnessing a rapid development in semantic-related web technologies. The intersection of the semantic web and WS allows the development of semantic WS. In this work, we adopt a complex network perspective to perform a comparative analysis of the syntactic and semantic approaches used to describe WS. From a collection of publicly available WS descriptions, we extract syntactic and semantic WS interaction networks. We take advantage of tools from the complex network field to analyze them and determine their properties. We show that WS interaction networks exhibit some of the typical characteristics observed in real-world networks, such as short average distance between nodes and community structure. By comparing syntactic and semantic networks through their properties, we show the introduction of semantics in WS descriptions should improve the composition process.


Parsimonious module inference in large networks

arXiv.org Machine Learning

We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length (MDL) principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as $\sqrt{N}$, where $N$ is the number of nodes in the network, for a fixed average degree $$. We also show that the simplicity of the MDL approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of $O(\tau N\log N)$, if the number of blocks is unknown, and $O(\tau N)$ if it is known, where $\tau$ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over $10^6$ edges, and a dissortative, bipartite block structure.


Reports on the 2012 AIIDE Workshops

AI Magazine

The 2012 AIIDE Conference included four workshops: Artificial Intelligence in Adversarial Real-Time Games, Human Computation in Deigital Entertainment and AI for Serious Games, Intelligent Narrative Technologies, and Musican Metacreation. The workshops took place October 8-9, 2012 at Stanford University. This report contains summaries of the activities of those four workshops.


Interactive Narrative: An Intelligent Systems Approach

AI Magazine

Interactive narrative is a form of digital interactive experience in which users create or influence a dramatic storyline through their actions. The goal of an interactive narrative system is to immerse the user in a virtual world such that he or she believes that they are an integral part of an unfolding story and that their actions can significantly alter the direction and/or outcome of the story.In this article we review the ways in which artificial intelligence can be brought to bear on the creation of interactive narrative systems. We lay out the landscape of about 20 years of interactive narrative research and explore the successes as well as open research questions pertaining to the novel use of computational narrative intelligence in the pursuit of entertainment, education, and training.


An Architecture for Probabilistic Concept-Based Information Retrieval

arXiv.org Artificial Intelligence

While concept-based methods for information retrieval can provide improved performance over more conventional techniques, they require large amounts of effort to acquire the concepts and their qualitative and quantitative relationships. This paper discusses an architecture for probabilistic concept-based information retrieval which addresses the knowledge acquisition problem. The architecture makes use of the probabilistic networks technology for representing and reasoning about concepts and includes a knowledge acquisition component which partially automates the construction of concept knowledge bases from data. We describe two experiments that apply the architecture to the task of retrieving documents about terrorism from a set of documents from the Reuters news service. The experiments provide positive evidence that the architecture design is feasible and that there are advantages to concept-based methods.


Topic Discovery through Data Dependent and Random Projections

arXiv.org Machine Learning

We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.


Visualizing and Interacting with Concept Hierarchies

arXiv.org Machine Learning

Concept Hierarchies and Formal Concept Analysis are theoretically well grounded and largely experimented methods. They rely on line diagrams called Galois lattices for visualizing and analysing object-attribute sets. Galois lattices are visually seducing and conceptually rich for experts. However they present important drawbacks due to their concept oriented overall structure: analysing what they show is difficult for non experts, navigation is cumbersome, interaction is poor, and scalability is a deep bottleneck for visual interpretation even for experts. In this paper we introduce semantic probes as a means to overcome many of these problems and extend usability and application possibilities of traditional FCA visualization methods. Semantic probes are visual user centred objects which extract and organize reduced Galois sub-hierarchies. They are simpler, clearer, and they provide a better navigation support through a rich set of interaction possibilities. Since probe driven sub-hierarchies are limited to users focus, scalability is under control and interpretation is facilitated. After some successful experiments, several applications are being developed with the remaining problem of finding a compromise between simplicity and conceptual expressivity.