Learning Semantic Definitions of Online Information Sources
Carman, M. J., Knoblock, C. A.
–Journal of Artificial Intelligence Research
The Internet contains a very large number of information sources providing many types of data from weather forecasts to travel deals and financial information. These sources can be accessed via Web-forms, Web Services, RSS feeds and so on. In order to make automated use of these sources, we need to model them semantically, but writing semantic descriptions for Web Services is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions of Web sources. In order to learn these definitions, our system actively invokes the sources and compares the data they produce with that of known sources of information. It then performs an inductive logic search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. In this paper we perform an empirical evaluation of the system using real-world Web sources. The evaluation demonstrates the effectiveness of the approach, showing that we can automatically learn complex models for real sources in reasonable time. We also compare our system with a complex schema matching system, showing that our approach can handle the kinds of problems tackled by the latter.
Journal of Artificial Intelligence Research
Sep-11-2007
- Country:
- Asia > India
- Europe
- Greece (0.04)
- Ireland (0.04)
- Italy
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Tuscany > Pisa Province
- Pisa (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- North America
- Canada (0.04)
- United States
- California
- Los Angeles County > Los Angeles (0.04)
- Santa Clara County > San Jose (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- District of Columbia > Washington (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Maryland > Montgomery County
- Rockville (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- California
- Oceania > Australia (0.04)
- Industry:
- Banking & Finance (1.00)
- Consumer Products & Services (0.93)
- Government > Regional Government
- Transportation (0.68)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Statistical Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning
- Information Fusion (0.93)
- Logic & Formal Reasoning (0.93)
- Ontologies (0.66)
- Search (0.68)
- Communications > Web (1.00)
- Information Management (1.00)
- Artificial Intelligence
- Information Technology