Industry
Materializing and Persisting Inferred and Uncertain Knowledge in RDF Datasets
McGlothlin, James P. (The University of Texas at Dallas) | Khan, Latifur (The University of Texas At Dallas)
As the semantic web grows in popularity and enters the mainstream of computer technology, RDF (Resource Description Framework) datasets are becoming larger and more complex. Advanced semantic web ontologies, especially in medicine and science, are developing. As more complex ontologies are developed, there is a growing need for efficient queries that handle inference. In areas such as research, it is vital to be able to perform queries that retrieve not just facts but also inferred knowledge and uncertain information. OWL (Web Ontology Language) defines rules that govern provable inference in semantic web datasets. In this paper, we detail a database schema using bit vectors that is designed specifically for RDF datasets. We introduce a framework for materializing and storing inferred triples. Our bit vector schema enables storage of inferred knowledge without a query performance penalty. Inference queries are simplified and performance is improved. Our evaluation results demonstrate that our inference solution is more scalable and efficient than the current state-of-the-art. There are also standards being developed for representing probabilistic reasoning within OWL ontologies. We specify a framework for materializing uncertain information and probabilities using these ontologies. We define a multiple vector schema for representing probabilities and classifying uncertain knowledge using thresholds. This solution increases the breadth of information that can be efficiently retrieved.
Optimal Social Trust Path Selection in Complex Social Networks
Liu, Guanfeng (Macquarie University) | Wang, Yan (Macquarie University) | Orgun, Mehmet A (Macquarie University)
Online social networks are becoming increasingly popular and are being used as the means for a variety of rich activities. This demands the evaluation of the trustworthiness between two unknown participants along a certain social trust path between them in the social network. However, there are usually many social trust paths between participants. Thus, a challenging problem is finding which social trust path is the optimal one that can yield the most trustworthy evaluation result. In this paper, we first present a new complex social network structure and a new concept of Quality of Trust (QoT) to illustrate the ability to guarantee a certain level of trustworthiness in trust evaluation. We then model the optimal social trust path selection as a Multi-Constrained Optimal Path (MCOP) selection problem which is NP-Complete. For solving this problem, we propose an efficient approximation algorithm MONTE K based on the Monte Carlo method. The results of our experiments conducted on a real dataset of social networks illustrate that our proposed algorithm significantly outperforms existing approaches in both efficiency and the quality of selected social trust paths.
Temporal Information Extraction
Ling, Xiao (University of Washington) | Weld, Daniel S. (University of Washington)
Research on information extraction (IE) seeks to distill relational tuples from natural language text, such as the contents of the WWW. Most IE work has focussed on identifying static facts, encoding them as binary relations. This is unfortunate, because the vast majority of facts are fluents, only holding true during an interval of time. It is less helpful to extract PresidentOf(Bill-Clinton, USA) without the temporal scope 1/20/93 โ 1/20/01. This paper presents TIE, a novel, information-extraction system, which distills facts from text while inducing as much temporal information as possible. In addition to recognizing temporal relations between times and events, TIE performs global inference, enforcing transitivity to bound the start and ending times for each event. We introduce the notion of temporal entropy as a way to evaluate the performance of temporal IE systems and present experiments showing that TIE outperforms three alternative approaches.
Sentiment Analysis with Global Topics and Local Dependency
Li, Fangtao (Tsinghua University) | Huang, Minlie (Tsinghua University) | Zhu, Xiaoyan (Tsinghua University)
With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly model topic and sentiment, take the advantage of the relationship between topics and sentiment, and are shown to be superior to traditional sentiment analysis tools. However, most of them make the assumption that, given the parameters, the sentiments of the words in the document are all independent. In our observation, in contrast, sentiments are expressed in a coherent way. The local conjunctive words, such as โandโ or โbutโ, are often indicative of sentiment transitions. In this paper, we propose a major departure from the previous approaches by making two linked contributions. First, we assume that the sentiments are related to the topic in the document, and put forward a joint sentiment and topic model, i.e. Sentiment-LDA. Second, we observe that sentiments are dependent on local context. Thus, we further extend the Sentiment-LDA model to Dependency-Sentiment-LDA model by relaxing the sentiment independent assumption in Sentiment-LDA. The sentiments of words are viewed as a Markov chain in Dependency-Sentiment-LDA. Through experiments, we show that exploiting the sentiment dependency is clearly advantageous, and that the Dependency-Sentiment-LDA is an effective approach for sentiment analysis.
Learning to Predict Opinion Share in Social Networks
Kimura, Masahiro (Ryukoku University) | Saito, Kazumi (University of Shizuoka) | Ohara, Kouzou (Aoyama Gakuin University) | Motoda, Hiroshi (Osaka University)
Blogosphere and sites such as for social networking, There has been a variety of work on the voter model. Dynamical knowledge-sharing and media-sharing in the World Wide properties of the basic model, including how the degree Web have enabled to form various kinds of large social distribution and the network size affect the mean time networks, through which behaviors, ideas and opinions to reach consensus, have been extensively studied (Liggett can spread. Thus, substantial attention has been directed 1999; Sood and Redner 2005) from mathematical point to investigating the spread of influence in these networks of view. Several variants of the voter model are also investigated (Leskovec, Adamic, and Huberman 2007; Crandall et al.
Prioritization of Domain-Specific Web Information Extraction
Huang, Jian (Pennsylvania State University) | Yu, Cong (Yahoo! Research)
It is often desirable to extract structured information from raw web pages for better information browsing, query answering, and pattern mining. many such Information Extraction (IE) technologies are costly and applying them at the web-scale is impractical. In this paper, we propose a novel prioritization approach where candidate pages from the corpus are ordered according to their expected contribution to the extraction results and those with higher estimated potential are extracted earlier. Systems employing this approach can stop the extraction process at any time when the resource gets scarce (i.e., not all pages in the corpus can be processed), without worrying about wasting extraction effort on unimportant pages. More specifically, we define a novel notion to measure the value of extraction results and design various mechanisms for estimating a candidate pageโs contribution to this value. We further design and build the Extraction Prioritization (EP) system with efficient scoring and scheduling algorithms, and experimentally demonstrate that EP significantly outperforms the naive approach and is more flexible than the classifier approach.
Visual Contextual Advertising: Bringing Textual Advertisements to Images
Chen, Yuqiang (Shanghai Jiao Tong University) | Jin, Ou (Shanghai Jiao Tong University) | Xue, Gui-Rong (Shanghai Jiao Tong University) | Chen, Jia (Shanghai Jiao Tong University) | Yang, Qiang (Hong Kong University of Science and Technology)
Advertising in the case of textual Web pages has been studied extensively by many researchers. However, with the increasing amount of multimedia data such as image, audio and video on the Web, the need for recommending advertisement for the multimedia data is becoming a reality. In this paper, we address the novel problem of visual contextual advertising, which is to directly advertise when users are viewing images which do not have any surrounding text. A key challenging issue of visual contextual advertising is that images and advertisements are usually represented in image space and word space respectively, which are quite different with each other inherently. As a result, existing methods for Web page advertising are inapplicable since they represent both Web pages and advertisement in the same word space. In order to solve the problem, we propose to exploit the social Web to link these two feature spaces together. In particular, we present a unified generative model to integrate advertisements, words and images. Specifically, our solution combines two parts in a principled approach: First, we transform images from a image feature space to a word space utilizing the knowledge from images with annotations from social Web. Then, a language model based approach is applied to estimate the relevance between transformed images and advertisements. Moreover, in this model, the probability of recommending an advertisement can be inferred efficiently given an image, which enables potential applications to online advertising.
Toward an Architecture for Never-Ending Language Learning
Carlson, Andrew (Carnegie Mellon University) | Betteridge, Justin (Carnegie Mellon University) | Kisiel, Bryan (Carnegie Mellon University) | Settles, Burr (Carnegie Mellon University) | Hruschka, Estevam R. (Federal University of Sao Carlos) | Mitchell, Tom M. (Carnegie Mellon University)
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74% after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.
GTPA: A Generative Model For Online Mentor-Apprentice Networks
Ahmad, Muhammad Aurangzeb (University of Minnesota) | Huffakar, David (University of Michigan) | Wang, Jing (Northwestern University) | Treem, Jeff (Northwestern University) | Poole, Marshall Scott (University of Illinois - Urbana-Champaign) | Srivastava, Jaideep (University of Minnesota)
There is a large body of work on the evolution of graphs in various domains, which shows that many real graphs evolve in a similar manner. In this paper we study a novel type of network formed by mentor-apprentice relationships in a massively multiplayer online role playing game. We observe that some of the static and dynamic laws which have been observed in many other real world networks are not observed in this network. Consequently well known graph generators like Preferential Attachment, Forest Fire, Butterfly, RTM, etc., cannot be applied to such mentoring networks. We propose a novel generative model to generate networks with the characteristics of mentoring networks.
A Fast Heuristic Search Algorithm for Finding the Longest Common Subsequence of Multiple Strings
Wang, Qingguo (University of Missouri) | Pan, Mian (University of Missouri) | Shang, Yi (University of Missouri) | Korkin, Dmitry (University of Missouri)
Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In this paper, a novel search algorithm, MLCS-A*, is presented for the general case of multiple LCS (or MLCS) problems. MLCS-A* is a variant of the A* algorithm. It maximizes a new heuristic estimate of the LCS in each search step so that the longest common subsequence can be found. As a natural extension of MLCS-A*, a fast algorithm, MLCS-APP, is also proposed to deal with large volume of biological data for which finding a LCS within reasonable time is impossible. The benchmark test shows that MLCS-APP is able to extract common subsequences close to the optimal ones and that MLCS-APP significantly outperforms existing heuristic approaches. When applied to 8 protein domain families, MLCS-APP produced more accurate results than existing multiple sequence alignment methods.