Goto

Collaborating Authors

 Case-Based Reasoning


An upper bound on prototype set size for condensed nearest neighbor

arXiv.org Machine Learning

The nearest neighbor (NN) rule assigns to an unclassified point the class of a closest point from a set of prototypical points. The NN algorithm stores every training point as a prototypical point and classifies new points according to the NN rule. A nice property is that, for arbitrary class distributions, as the number of training points goes to infinity, the error of the rule produced by the NN algorithm converges to within twice the Bayes error [4]. Unfortunately, storing every training point as a prototypical point can be impractical for huge training sets in terms of both memory complexity and the time complexity of classifying according to the NN rule. As a result, many techniques exist for reducing the size of the set of prototypical points.


Convergence of Nearest Neighbor Pattern Classification with Selective Sampling

arXiv.org Machine Learning

In the panoply of pattern classification techniques, few enjoy the intuitive appeal and simplicity of the nearest neighbor rule: given a set of samples in some metric domain space whose value under some function is known, we estimate the function anywhere in the domain by giving the value of the nearest sample per the metric. More generally, one may use the modal value of the m nearest samples, where m is a fixed positive integer (although m=1 is known to be admissible in the sense that no larger value is asymptotically superior in terms of prediction error). The nearest neighbor rule is nonparametric and extremely general, requiring in principle only that the domain be a metric space. The classic paper on the technique, proving convergence under independent, identically-distributed (iid) sampling, is due to Cover and Hart (1967). Because taking samples is costly, there has been much research in recent years on selective sampling, in which each sample is selected from a pool of candidates ranked by a heuristic; the heuristic tries to guess which candidate would be the most "informative" sample. Lindenbaum et al. (2004) apply selective sampling to the nearest neighbor rule, but their approach sacrifices the austere generality of Cover and Hart; furthermore, their heuristic algorithm is complex and computationally expensive. Here we report recent results that enable selective sampling in the original Cover-Hart setting. Our results pose three selection heuristics and prove that their nearest neighbor rule predictions converge to the true pattern. Two of the algorithms are computationally cheap, with complexity growing linearly in the number of samples. We believe that these results constitute an important advance in the art.


A Case-Based Solution to the Cold-Start Problem in Group Recommenders

AAAI Conferences

In this paper we offer a potential solution to the cold-start problem in group recommender systems. To do so, we use information about previous group recommendation events and copy ratings from a user who played a similar role in some previous group event. We show that copying in this way, i.e. conditioned on groups, is superior to copying nothing and also superior to copying ratings from the most similar user known to the system.


Preference-Based CBR: General Ideas and Basic Principles

AAAI Conferences

Building on recent research on preference handling in artificial intelligence and related fields, our goal is to develop a coherent and generic methodological framework for case-based reasoning (CBR) on the basis of formal concepts and methods for knowledge representation and reasoning with preferences. A preference-based approach to CBR appears to be appealing for several reasons, notably because case-based experiences naturally lend themselves to representations in terms of preference or order relations. Moreover, the flexibility and expressiveness of a preference-based formalism well accommodate the uncertain and approximate nature of case-based problem solving. In this paper, we outline the basic ideas of preference-based CBR and sketch a formal framework for realizing these ideas.


Case Adaptation with Qualitative Algebras

AAAI Conferences

This paper proposes an approach for the adaptation of spatial or temporal cases in a case-based reasoning system. Qualitative algebras are used as spatial and temporal knowledge representation languages. The intuition behind this adaptation approach is to apply a substitution and then repair potential inconsistencies, thanks to belief revision on qualitative algebras. A temporal example from the cooking domain is given.


A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing

AAAI Conferences

Nowadays, Nearest Neighbor Search becomes more and more important when facing the challenge of big data. Traditionally, to solve this problem, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the advantages of both the effective data structure and the fast Hamming distance computation in hashing methods. In this way, the searching procedure can be further accelerated. Computational complexity analysis and extensive experiments have demonstrated the effectiveness of our proposed scheme.


Case-Based Meta-Prediction for Bioinformatics

AAAI Conferences

Before laboratory testing, bioinformatics problems often require a machine-learned predictor to identify the most likely choices among a wealth of possibilities. Researchers may advocate different predictors for the same problem, none of which is best in all situations. This paper introduces a case-based meta-predictor that combines a set of elaborate, pre-existing predictors to improve their accuracy on a difficult and important problem: protein-ligand docking. The method focuses on the reliability of its component predictors, and has broad potential applications in biology and chemistry. Despite noisy and biased input, the method outperforms its individual components on benchmark data. It provides a promising solution for the performance improvement of compound virtual screening, which would thereby reduce the time and cost of drug discovery.


Model-Lite Case-Based Planning

AAAI Conferences

There is increasing awareness in the planning community that depending on complete models impedes the applicability of planning technology in many real world domains where the burden of specifying complete domain models is too high. In this paper, we consider a novel solution for this challenge that combines generative planning on incomplete domain models with a library of plan cases that are known to be correct. While this was arguably the original motivation for case-based planning, most existing case-based planners assume (and depend on) from-scratch planners that work on complete domain models. In contrast, our approach views the plan generated with respect to the incomplete model as a ``skeletal plan'' and augments it with directed mining of plan fragments from library cases. We will present the details of our approach and present an empirical evaluation of our method in comparison to a state-of-the-art case-based planner that depends on complete domain models.


Reciprocal Hash Tables for Nearest Neighbor Search

AAAI Conferences

Recent years have witnessed the success of hashingtechniques in approximate nearest neighbor search. Inpractice, multiple hash tables are usually employed toretrieve more desired results from all hit buckets ofeach table. However, there are rare works studying theunified approach to constructing multiple informativehash tables except the widely used random way. In thispaper, we regard the table construction as a selectionproblem over a set of candidate hash functions. Withthe graph representation of the function set, we proposean efficient solution that sequentially applies normal-ized dominant set to finding the most informative andindependent hash functions for each table. To furtherreduce the redundancy between tables, we explore thereciprocal hash tables in a boosting manner, where thehash function graph is updated with high weights em-phasized on the misclassified neighbor pairs of previoushash tables. The construction method is general andcompatible with different types of hashing algorithmsusing different feature spaces and/or parameter settings.Extensive experiments on two large-scale benchmarksdemonstrate that the proposed method outperforms bothnaive construction method and state-of-the-art hashingalgorithms, with up to 65.93% accuracy gains.


Parameterized Complexity Results for Plan Reuse

AAAI Conferences

Planning is a notoriously difficult computational problem of high worst-case complexity. Researchers have been investing significant efforts to develop heuristics or restrictions to make planning practically feasible. Case-based planning is a heuristic approach where one tries to reuse previous experience when solving similar problems in order to avoid some of the planning effort. Plan reuse may offer an interesting alternative to plan generation in some settings. We provide theoretical results that identify situations in which plan reuse is provably tractable. We perform our analysis in the framework of parameterized complexity, which supports a rigorous worst-case complexity analysis that takes structural properties of the input into account in terms of parameters. A central notion of parameterized complexity is fixed-parameter tractability which extends the classical notion of polynomial-time tractability by utilizing the effect of parameters. We draw a detailed map of the parameterized complexity landscape of several variants of problems that arise in the context of case-based planning. In particular, we consider the problem of reusing an existing plan, imposing various restrictions in terms of parameters, such as the number of steps that can be added to the existing plan to turn it into a solution of the planning instance at hand.