Goto

Collaborating Authors

 knoblock


Knoblock

AAAI Conferences

Much of the focus on big data has been on the problem of processing very large sources. There is an equally hard problem of how to normalize, integrate, and transform the data from many sources into the format required to run large-scale analysis and visualization tools. We have previously developed an approach to semi-automatically mapping diverse sources into a shared domain ontology so that they can be quickly combined. In this paper we describe our approach to building and executing integration and restructuring plans to support analysis and visualization tools on very large and diverse datasets.


Exploiting Semantics for Big Data Integration

AI Magazine

An equally important dimension of big data is variety, where the focus is to process highly heterogeneous data sets. We describe how we use semantics to address the problem of big data variety. We also describe Karma, a system that implements our approach and show how Karma can be applied to integrate data in the cultural heritage domain. In this use case, Karma integrates data across many museums even though the data sets from different museums are highly heterogeneous. Volume refers to the problem of how to deal with very large data sets, which typically requires execution in a distributed cloud-based infrastructure.


1563

AI Magazine

As in past years, papers were solicited in two categories: (1) deployed applications and (2) emerging applications and technologies. Deployed application papers describe systems that have been in use for at least several months by individuals or organizations other than their developers, have measurable benefits, and incorporate AI technologies. Emerging applications are technologies and systems that are close to deployment and clearly show an innovative implementation of AI technologies. These papers are of value not only to other application developers looking for guidance in applying various techniques to their own applications but also to researchers who need to understand the unique technical challenges provided by real-world problems. For IAAI-2002, we received 54 submissions, containing a wealth of outstanding applications and emerging technology papers (15 deployed and 39 emerging).


Beyond the Elves: Making Intelligent Agents Intelligent

AI Magazine

The goal of the Electric Elves project was to develop software agent technology to support human organizations. We developed a variety of applications of the Elves, including scheduling visitors, managing a research group (the Office Elves), and monitoring travel (the Travel Elves). The Travel Elves were eventually deployed at DARPA, where things did not go exactly as planned. In this article, we describe some of the things that went wrong and then present some of the lessons learned and new research that arose from our experience in building the Travel Elves. The project was quite successful with impressive prototypes and many papers on the research.


Automatically Utilizing Secondary Sources to Align Information Across Sources

AI Magazine

XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process.


The Value of AI Tools: Some Lessons Learned

AI Magazine

We are in the midst of an AI Spring, and it’s an exciting time for the AI community. AI is poised to change the world. Nevertheless, for many new AI technologies, it is still unclear how these technologies will be successfully productized, and which type of companies will be winners and losers. In this column, I reflect on some of the potential difficulties in commercializing AI technology, based on my personal experience developing information extraction software.


Exploiting Semantics for Big Data Integration

AI Magazine

There is a great deal of interest in big data, focusing mostly on data set size. The use of semantics in this integration descriptions and then integrating the data within process is key to building an approach that scales this unified framework. Finally, we conclude by to large numbers of heterogeneous sources. For example, in and (4) integrate the data across sources using this our museum use case, we received data in spreadsheets model. Karma has been used on a variety of types of (figure 1), comma-separated values (CSV), data, including biological data, mobile phone data, JSON (figure 3), XML, and relational databases (figure geospatial data, and cultural heritage data. In order to illustrate the approach to integrating One challenge in integrating diverse data sources is data in Karma, we will use an example from the cultural the ability to import different data formats into a heritage domain.


An Expressive Language and Efficient Execution System for Software Agents

arXiv.org Artificial Intelligence

Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine.


Wrapper Maintenance: A Machine Learning Approach

arXiv.org Artificial Intelligence

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.


Beyond the Elves: Making Intelligent Agents Intelligent

AI Magazine

In fact, DARPA, which funded the project, ways. Elves) (Scerri, Pynadath, and Tambe 2002; Finally, we will present some lessons Pynadath and Tambe 2003) and required learned and recent research that was motivated detailed information about the calendars by our experiences in deploying the of people using the system. Thus, we decided to deploy a new application of the Electric The Travel Elves introduced two major Elves, called the Travel Elves. This application advantages over traditional approaches to appeared to be ideal for wider deployment travel planning. First, the Travel Elves provided since it could be hosted entirely outside an interactive approach to making an organization and communication travel plans in which all of the data could be performed over wireless devices, required to make informed choices is such as cellular telephones. For example, when The mission of the Travel Elves (Ambite deciding whether to park at the airport or et al. 2002, Knoblock 2004) was to facilitate take a taxi, the system compares the cost planning a trip and to ensure that the of parking and the cost of a taxi given other resulting travel plan would execute selections, such as the airport, the specific smoothly. Initial deployment of the Travel parking lot, and the starting location Elves at DARPA went smoothly.