Edmonton
Improving a Page Classifier with Anchor Extraction and Link Analysis
Most text categorization systems use simple models of documents and document collections. In this paper we describe a technique that improves a simple web page classifier's performance on pages from a new, unseen web site, by exploiting link structure within a site as well as page structure within hub pages. On real-world test cases, this technique significantly and substantially improves the accuracy of a bag-of-words classifier, reducing error rate by about half, on average. The system uses a variant of co-training to exploit unlabeled data from a new site. Pages are labeled using the base classifier; the results are used by a restricted wrapper-learner to propose potential "main-category anchor wrappers"; and finally, these wrappers are used as features by a third learner to find a categorization of the site that implies a simple hub structure, but which also largely agrees with the original bag-of-words classifier.
Efficient Solution Algorithms for Factored MDPs
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
Optimal Schedules for Parallelizing Anytime Algorithms: The Case of Shared Resources
Finkelstein, L., Markovitch, S., Rivlin, E.
The performance of anytime algorithms can be improved by simultaneously solving several instances of algorithm-problem pairs. These pairs may include different instances of a problem (such as starting from a different initial state), different algorithms (if several alternatives exist), or several runs of the same algorithm (for non-deterministic algorithms). In this paper we present a methodology for designing an optimal scheduling policy based on the statistical characteristics of the algorithms involved. We formally analyze the case where the processes share resources (a single-processor model), and provide an algorithm for optimal scheduling. We analyze, theoretically and empirically, the behavior of our scheduling algorithm for various distribution types.
Exploiting Contextual Independence In Probabilistic Inference
Bayesian belief networks have grown to prominence because they provide compact representations for many problems for which probabilistic inference is appropriate, and there are algorithms to exploit this compactness. The next step is to allow compact representations of the conditional probabilities of a variable given its parents. In this paper we present such a representation that exploits contextual independence in terms of parent contexts; which variables act as parents may depend on the value of other variables. The internal representation is in terms of contextual factors (confactors) that is simply a pair of a context and a table. The algorithm, contextual variable elimination, is based on the standard variable elimination algorithm that eliminates the non-query variables in turn, but when eliminating a variable, the tables that need to be multiplied can depend on the context. This algorithm reduces to standard variable elimination when there is no contextual independence structure to exploit. We show how this can be much more efficient than variable elimination when there is structure to exploit. We explain why this new method can exploit more structure than previous methods for structured belief network inference and an analogous algorithm that uses trees.
The AAAI-2002 Robot Challenge
Kuipers, Benjamin J., Stroupe, Ashley
The Eighteenth National Conference on Artificial Intelligence (AAAI-2002) Robot Challenge is part of an annual series of robot challenges and competitions. It is intended to promote the development of robot systems that interact intelligently with humans in natural environments. The Challenge task calls for a robot to attend the AAAI conference, which includes registering for the conference and giving a talk about itself. In this article, we review the task requirements, introduce the robots that participated at AAAI-2002 and describe the strengths and weaknesses of their performance.
Wrapper Maintenance: A Machine Learning Approach
Lerman, K., Minton, S. N., Knoblock, C. A.
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.
AAAI 2002 Workshops
Blake, Brian, Haigh, Karen, Hexmoor, Henry, Falcone, Rino, Soh, Leen-Kiat, Baral, Chitta, McIlraith, Sheila, Gmytrasiewicz, Piotr, Parsons, Simon, Malaka, Rainer, Krueger, Antonio, Bouquet, Paolo, Smart, Bill, Kurumantani, Koichi, Pease, Adam, Brenner, Michael, desJardins, Marie, Junker, Ulrich, Delgrande, Jim, Doyle, Jon, Rossi, Francesca, Schaub, Torsten, Gomes, Carla, Walsh, Toby, Guo, Haipeng, Horvitz, Eric J., Ide, Nancy, Welty, Chris, Anger, Frank D., Guegen, Hans W., Ligozat, Gerald
The Association for the Advancement of Artificial Intelligence (AAAI) presented the AAAI-02 Workshop Program on Sunday and Monday, 28-29 July 2002 at the Shaw Convention Center in Edmonton, Alberta, Canada. The AAAI-02 workshop program included 18 workshops covering a wide range of topics in AI. The workshops were Agent-Based Technologies for B2B Electronic-Commerce; Automation as a Caregiver: The Role of Intelligent Technology in Elder Care; Autonomy, Delegation, and Control: From Interagent to Groups; Coalition Formation in Dynamic Multiagent Environments; Cognitive Robotics; Game-Theoretic and Decision-Theoretic Agents; Intelligent Service Integration; Intelligent Situation-Aware Media and Presentations; Meaning Negotiation; Multiagent Modeling and Simulation of Economic Systems; Ontologies and the Semantic Web; Planning with and for Multiagent Systems; Preferences in AI and CP: Symbolic Approaches; Probabilistic Approaches in Search; Real-Time Decision Support and Diagnosis Systems; Semantic Web Meets Language Resources; and Spatial and Temporal Reasoning.
Editorial Introduction: The Fifteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2002)
The Fourteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2002) was held from 28 July to 1 August in Edmonton, Alberta, Canada, in conjunction with the Seventeenth National Conference on Artificial Intelligence (AAAI-2002). As in past years, papers were solicited in two categories: (1) deployed applications and (2) emerging applications and technologies. Deployed application papers describe systems that have been in use for at least several months by individuals or organizations other than their developers, have measurable benefits, and incorporate AI technologies. Emerging applications are technologies and systems that are close to deployment and clearly show an innovative implementation of AI technologies. These papers are of value not only to other application developers looking for guidance in applying various techniques to their own applications but also to researchers who need to understand the unique technical challenges provided by real-world problems.
An Analysis of Phase Transition in NK Landscapes
In this paper, we analyze the decision version of the NK landscape model from the perspective of threshold phenomena and phase transitions under two random distributions, the uniform probability model and the fixed ratio model. For the uniform probability model, we prove that the phase transition is easy in the sense that there is a polynomial algorithm that can solve a random instance of the problem with the probability asymptotic to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several upper bounds for the solubility threshold, and prove that random instances with parameters above these upper bounds can be solved polynomially. This, together with our empirical study for random instances generated below and in the phase transition region, suggests that the phase transition of the fixed ratio model is also easy.