Collaborating Authors

Case-Based Reasoning

COMET: An Application of Model-Based Reasoning to Accounting Systems

AI Magazine

An important problem faced by auditors is gauging how much reliance can be placed on the accounting systems that process millions of transactions to produce the numbers summarized in a company's financial statements. Accounting sys-ems contain internal controls, procedures designed to detect and correct errors and irregularities that can occur in the processing of transactions. In a complex accounting system, it can be an extremely difficult task for the auditor to anticipate the possible errors that can occur and evaluate the effectiveness of the controls at detecting them. An accurate analysis must take into account the unique features of each company's business processes. To cope with this complexity and variability, the COMET system applies a model-based reasoning approach to the analysis of accounting systems and their controls.

CARMA: A Case-Based Rangeland Management Adviser

AI Magazine

CARMA is an advisory system for rangeland grasshopper infestations that demonstrates how AI technology can deliver expert advice to compensate for cutbacks in public services. CARMA uses two knowledge sources for the key task of predicting forage consumption by grasshoppers: (1) cases obtained by asking a group of experts to solve representative hypothetical problems and (2) a numeric model of rangeland ecosystems. These knowledge sources are integrated through the technique of model-based adaptation, in which case-based reasoning is used to find an approximate solution, and the model is used to adapt this approximate solution into a more precise solution. CARMA has been used in Wyoming counties since 1996. The combination of a simple interface, flexible control strategy, and integration of multiple knowledge sources makes CARMA accessible to inexperienced users and capable of producing advice comparable to that produced by human experts.

AI and Music: From Composition to Expressive Performance

AI Magazine

In this article, we first survey the three major types of computer music systems based on AI techniques: (1) compositional, (2) improvisational, and (3) performance systems. Representative examples of each type are briefly described. Then, we look in more detail at the problem of endowing the resulting performances with the expressiveness that characterizes human-generated music. This is one of the most challenging aspects of computer music that has been addressed just recently. The main problem in modeling expressiveness is to grasp the performer's "touch," that is, the knowledge applied when performing a score.

Playing with Cases: Rendering Expressive Music with Case-Based Reasoning

AI Magazine

Following a brief overview discussing why we prefer listening to expressive music instead of lifeless synthesized music, we examine a representative selection of well-known approaches to expressive computer music performance with an emphasis on AI-related approaches. In the main part of the paper we focus on the existing CBR approaches to the problem of synthesizing expressive music, and particularly on TempoExpress, a case-based reasoning system developed at our Institute, for applying musically acceptable tempo transformations to monophonic audio recordings of musical performances. Finally we briefly describe an ongoing extension of our previous work consisting on complementing audio information with information of the gestures of the musician. Music is played through our bodies, therefore capturing the gesture of the performer is a fundamental aspect that has to be taken into account in future expressive music renderings. This paper is based on the "2011 Robert S. Engelmore Memorial Lecture" given by the first author at AAAI/IAAI 2011.

Generalization through Memorization: Nearest Neighbor Language Models - Facebook Research


We introduce kNN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a k-nearest neighbors (kNN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong WIKITEXT-103 LM, with neighbors drawn from the original training set, our kNN-LM achieves a new state-of-the-art perplexity of 15.79 – a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge.

k-Nearest Neighbour Classifiers -- 2nd Edition Machine Learning

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.

A new hashing based nearest neighbors selection technique for big datasets Machine Learning

KNN has the reputation to be the word simplest but efficient supervised learning algorithm used for either classification or regression. KNN prediction efficiency highly depends on the size of its training data but when this training data grows KNN suffers from slowness in making decisions since it needs to search nearest neighbors within the entire dataset at each decision making. This paper proposes a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given observation. The proposed approach consists of dividing the data space into subcells of a virtual grid built on top of data space. The mapping between the data points and subcells is performed using hashing. When it comes to select the nearest neighbors of a given observation, we firstly identify the cell the observation belongs by using hashing, and then we look for nearest neighbors from that central cell and cells around it layer by layer. From our experiment performance analysis on publicly available datasets, our algorithm outperforms the original KNN in time efficiency with a prediction quality as good as that of KNN it also offers competitive performance with solutions like KDtree

Heed how AI is changing the business world - Ibiixo Technologies.


You can be addicted to your Artificial Intelligence (AI) software as much as your favored fortune. And you'll feel rewarding being addicted to your AI. Because they replace the extravagance, inefficiency, and endangerment associated with business operations. Tech Oracle if you ask? Employing AI will lessen human error, mundane tasks, in turn, more time for innovation. This means you print money while remaining effortless.

Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Neural Information Processing Systems

Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves 5000 queries a second with 3ms mean latency and 95% 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS and IVFOADC G P plateau at around 50% 1-recall@1.

Rates of Convergence for Large-scale Nearest Neighbor Classification

Neural Information Processing Systems

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter.