Goto

Collaborating Authors

 Europe


Automatically Utilizing Secondary Sources to Align Information Across Sources

AI Magazine

XML, web services, and the semantic web have opened the door for new and exciting informationintegration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the recordlinkage process. We present an approach to accurately and automatically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.


Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods

Journal of Artificial Intelligence Research

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds--- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations.


Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

Journal of Artificial Intelligence Research

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data - i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi -supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias.


Graduality in Argumentation

Journal of Artificial Intelligence Research

Argumentation is based on the exchange and valuation of interacting arguments, followed by the selection of the most acceptable of them (for example, in order to take a decision, to make a choice). Starting from the framework proposed by Dung in 1995, our purpose is to introduce 'graduality' in the selection of the best arguments, i.e., to be able to partition the set of the arguments in more than the two usual subsets of 'selected' and 'non-selected' arguments in order to represent different levels of selection. Our basic idea is that an argument is all the more acceptable if it can be preferred to its attackers. First, we discuss general principles underlying a 'gradual' valuation of arguments based on their interactions. Following these principles, we define several valuation models for an abstract argumentation system. Then, we introduce 'graduality' in the concept of acceptability of arguments. We propose new acceptability classes and a refinement of existing classes taking advantage of an available 'gradual' valuation.


Strong Asymptotic Assertions for Discrete MDL in Regression and Classification

arXiv.org Artificial Intelligence

We study the properties of the MDL (or maximum penalized complexity) estimator for Regression and Classification, where the underlying model class is countable. We show in particular a finite bound on the Hellinger losses under the only assumption that there is a "true" model contained in the class. This implies almost sure convergence of the predictive distribution to the true one at a fast rate. It corresponds to Solomonoff's central theorem of universal induction, however with a bound that is exponentially larger.


Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments

Journal of Artificial Intelligence Research

In this paper, we confront the problem of applying reinforcement learning to agents that perceive the environment through many sensors and that can perform parallel actions using many actuators as is the case in complex autonomous robots. We argue that reinforcement learning can only be successfully applied to this case if strong assumptions are made on the characteristics of the environment in which the learning is performed, so that the relevant sensor readings and motor commands can be readily identified. The introduction of such assumptions leads to strongly-biased learning systems that can eventually lose the generality of traditional reinforcement-learning algorithms. In this line, we observe that, in realistic situations, the reward received by the robot depends only on a reduced subset of all the executed actions and that only a reduced subset of the sensor inputs (possibly different in each situation and for each action) are relevant to predict the reward. We formalize this property in the so called 'categorizability assumption' and we present an algorithm that takes advantage of the categorizability of the environment, allowing a decrease in the learning time with respect to existing reinforcement-learning algorithms. Results of the application of the algorithm to a couple of simulated realistic-robotic problems (landmark-based navigation and the six-legged robot gait generation) are reported to validate our approach and to compare it to existing flat and generalization-based reinforcement-learning approaches.


Combining Spatial and Temporal Logics: Expressiveness vs. Complexity

Journal of Artificial Intelligence Research

In this paper, we construct and investigate a hierarchy of spatio-temporal formalisms that result from various combinations of propositional spatial and temporal logics such as the propositional temporal logic PTL, the spatial logics RCC-8, BRCC-8, S4u and their fragments. The obtained results give a clear picture of the trade-off between expressiveness and `computational realisability' within the hierarchy. We demonstrate how different combining principles as well as spatial and temporal primitives can produce NP-, PSPACE-, EXPSPACE-, 2EXPSPACE-complete, and even undecidable spatio-temporal logics out of components that are at most NP- or PSPACE-complete.


Extremal Behaviour in Multiagent Contract Negotiation

Journal of Artificial Intelligence Research

We examine properties of a model of resource allocation in which several agents exchange resources in order to optimise their individual holdings. The schemes discussed relate to well-known negotiation protocols proposed in earlier work and we consider a number of alternative notions of ``rationality'' covering both quantitative measures, e.g. cooperative and individual rationality and more qualitative forms, e.g. Pigou-Dalton transfers. While it is known that imposing particular rationality and structural restrictions may result in some reallocations of the resource set becoming unrealisable, in this paper we address the issue of the number of restricted rational deals that may be required to implement a particular reallocation when it is possible to do so. We construct examples showing that this number may be exponential (in the number of resources m), even when all of the agent utility functions are monotonic. We further show that k agents may achieve in a single deal a reallocation requiring exponentially many rational deals if at most k-1 agents can participate, this same reallocation being unrealisable by any sequences of rational deals in which at most k-2 agents are involved.


Finding Approximate POMDP solutions Through Belief Compression

Journal of Artificial Intelligence Research

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.


A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors

Neural Information Processing Systems

A mixed-signal image filtering VLSI has been developed aiming at real-time generation of edge-based image vectors for robust image recognition. A four-stage asynchronous median detection architecture basedon analog digital mixed-signal circuits has been introduced todetermine the threshold value of edge detection, the key processing parameter in vector generation. As a result, a fully seamless pipeline processing from threshold detection to edge feature mapgeneration has been established. A prototype chip was designed in a 0.35-µm double-polysilicon three-metal-layer CMOS technology and the concept was verified by the fabricated chip. The chip generates a 64-dimension feature vector from a 64x64-pixel gray scale image every 80µsec.