The advent of the Web of Data kindled interest in link-traversal (or lookup-based) query processing methods, with which queries are answered via dereferencing a potentially large number of small, interlinked sources. While several algorithms for query evaluation have been proposed, there exists no notion of completeness for results of so-evaluated queries. In this paper, we motivate the need for clearly-defined completeness classes and present several notions of completeness for queries over Linked Data, based on the idea of authoritativeness of sources, and show the relation between the different completeness classes.
Information retrieval and integration systems typically must handle incomplete and inconsistent data. Current approaches attempt to reconcile discrepant information by leveraging data quality, user preferences, or source provenance information. Such approaches may overlook the fact that information is interpreted relative to its context. Therefore, discrepancies may be explained and thereby resolved if contexts are taking into account. In this paper, we describe an information integrator that is capable of explaining its results. We focus on using knowledge of an assumption context learned through decision tree-based classification to inform the explanations. We further discuss some benefits and difficulties of applying assumption context in information retrieval. Finally, we indicate how to use Inference Web to explain discrepancies resulting from information retrieval and integration applications.
It also has the ability to generate operators during planning from Web pages using keyword extraction methods. When a user wants to understand a concept, it is useful to browse for relevant Web pages in the WW'W. However, in general, this task is very hard because the user does not know where such Web pages are, and has to search for them in the vast WWW search space.
IMAGE: Brian Davison, Associate Professor of Computer Science Engineering at Lehigh University, is principal investigator of an NSF-backed project to develop a search engine intended to help scientists and others locate... view more There was a time--not that long ago--when the phrases "Google it" or "check Yahoo" would have been interpreted as sneezes, or a perhaps symptoms of an oncoming seizure, rather than as coherent thoughts. Today, these are key to answering all of life's questions. It's one thing to use the Web to keep up with a Kardashian, shop for ironic T-shirts, argue with our in-laws about politics, or any of the other myriad ways we use the Web in today's world. But if you are a serious researcher looking for real data that can help you advance your ideas, how useful are the underlying technologies that support the search engines we've all come to take for granted? "Not very," says Brian Davison, associate professor of computer science at Lehigh University.
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would provide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their webpage clusters. We also found that subjects generally created rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When analyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clusters, and number of clusters. These results provide a sobering note on any quest for a single clearly correct clustering method for web pages.