Goto

Collaborating Authors

 Country


Wrapper Maintenance: A Machine Learning Approach

Journal of Artificial Intelligence Research

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.


Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation

Journal of Artificial Intelligence Research

This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese.


An Evolutionary Algorithm with Advanced Goal and Priority Specification for Multi-objective Optimization

Journal of Artificial Intelligence Research

This paper presents an evolutionary algorithm with a new goal-sequence domination scheme for better decision support in multi-objective optimization. The approach allows the inclusion of advanced hard/soft priority and constraint information on each objective component, and is capable of incorporating multiple specifications with overlapping or non-overlapping objective functions via logical "OR" and "AND" connectives to drive the search towards multiple regions of trade-off. In addition, we propose a dynamic sharing scheme that is simple and adaptively estimated according to the on-line population distribution without needing any a priori parameter setting. Each feature in the proposed algorithm is examined to show its respective contribution, and the performance of the algorithm is compared with other evolutionary optimization methods. It is shown that the proposed algorithm has performed well in the diversity of evolutionary search and uniform distribution of non-dominated individuals along the final trade-offs, without significant computational effort. The algorithm is also applied to the design optimization of a practical servo control system for hard disk drives with a single voice-coil-motor actuator. Results of the evolutionary designed servo control system show a superior closed-loop performance compared to classical PID or RPT approaches.


Learning to Order BDD Variables in Verification

Journal of Artificial Intelligence Research

The size and complexity of software and hardware systems have significantly increased in the past years. As a result, it is harder to guarantee their correct behavior. One of the most successful methods for automated verification of finite-state systems is model checking. Most of the current model-checking systems use binary decision diagrams (BDDs) for the representation of the tested model and in the verification process of its properties. Generally, BDDs allow a canonical compact representation of a boolean function (given an order of its variables). The more compact the BDD is, the better performance one gets from the verifier. However, finding an optimal order for a BDD is an NP-complete problem. Therefore, several heuristic methods based on expert knowledge have been developed for variable ordering. We propose an alternative approach in which the variable ordering algorithm gains 'ordering experience' from training models and uses the learned knowledge for finding good orders. Our methodology is based on offline learning of pair precedence classifiers from training models, that is, learning which variable pair permutation is more likely to lead to a good order. For each training model, a number of training sequences are evaluated. Every training model variable pair permutation is then tagged based on its performance on the evaluated orders. The tagged permutations are then passed through a feature extractor and are given as examples to a classifier creation algorithm. Given a model for which an order is requested, the ordering algorithm consults each precedence classifier and constructs a pair precedence table which is used to create the order. Our algorithm was integrated with SMV, which is one of the most widely used verification systems. Preliminary empirical evaluation of our methodology, using real benchmark models, shows performance that is better than random ordering and is competitive with existing algorithms that use expert knowledge. We believe that in sub-domains of models (alu, caches, etc.) our system will prove even more valuable. This is because it features the ability to learn sub-domain knowledge, something that no other ordering algorithm does.


Acquiring Word-Meaning Mappings for Natural Language Interfaces

Journal of Artificial Intelligence Research

This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.


The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Neural Information Processing Systems

Tangential hand velocity profiles of rapid human arm movements often appearas sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity ina motor control problem makes the optimal control policy essentially differentfrom the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drivethe arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observedin a number of psychophysical experiments.



Information Geometrical Framework for Analyzing Belief Propagation Decoder

Neural Information Processing Systems

The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium arenot clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding.


The Unified Propagation and Scaling Algorithm

Neural Information Processing Systems

In this paper we will show that a restricted class of constrained minimum divergenceproblems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation anditerative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.


Tree-based reparameterization for approximate inference on loopy graphs

Neural Information Processing Systems

We develop a tree-based reparameterization framework that provides anew conceptual view of a large class of iterative algorithms for computing approximate marginals in graphs with cycles. It includes belief propagation (BP), which can be reformulated as a very local form of reparameterization. More generally, we consider algorithms that perform exact computations over spanning trees of the full graph. On the practical side, we find that such tree reparameterization (TRP) algorithms have convergence properties superior to BP. The reparameterization perspective also provides a number of theoretical insights into approximate inference, including anew characterization of fixed points; and an invariance intrinsic to TRP /BP. These two properties enable us to analyze and bound the error between the TRP /BP approximations and the actual marginals. While our results arise naturally from the TRP perspective, most of them apply in an algorithm-independent manner to any local minimum of the Bethe free energy.