Collaborating Authors

Case-Based Reasoning

TAR 1.0 or TAR 2.0: Which method is best for you?


In Casepoint, for example, a user can begin a TAR 2.0 session by reviewing as few as 50 documents (although our recommended ranking threshold is every 100 documents), and at each ranking threshold, the model re-ranks the corpus automatically. Doing this in tandem with Casepoint's Dynamic Batching feature, the user ensures that they are always looking at the highest-ranked documents. This allows you to strengthen your model faster because TAR 2.0 will continue to present documents in the batches until none of the documents presented are of relevance. Another benefit of TAR 2.0 is the ability to run multiple sessions simultaneously, where each session represents a different legal topic or issue you are trying to find relevant documents for. Being able to "bucket" groups of documents by relevant issues and have people dive into the review right away is a huge step forward.

COVID-19 prison problem as cases soar at California's San Quentin

Al Jazeera

The California state jail system has seen a staggering increase in coronavirus cases over the past week - with cases at the overcrowded San Quentin facility jumping from 100 to 539 - and total inmate deaths across the state prison system totalling 20. Attorneys, advocates and former inmates say this increase suggests that lowering prison populations might be the only effective way to stop the pandemic's resurgence inside the US penitentiaries. The state has seen 1,001 new COVID-19 cases in its prison system in the past 14 days, the California Department of Corrections and Rehabilitation (CDCR) said on Friday afternoon. This increase comes as the United States experiences record-setting spikes in coronavirus cases. San Quentin is California's only state prison with a death row, accounted for the majority, with 512 new cases as of Friday.

COMET: An Application of Model-Based Reasoning to Accounting Systems

AI Magazine

An important problem faced by auditors is gauging how much reliance can be placed on the accounting systems that process millions of transactions to produce the numbers summarized in a company's financial statements. Accounting sys-ems contain internal controls, procedures designed to detect and correct errors and irregularities that can occur in the processing of transactions. In a complex accounting system, it can be an extremely difficult task for the auditor to anticipate the possible errors that can occur and evaluate the effectiveness of the controls at detecting them. An accurate analysis must take into account the unique features of each company's business processes. To cope with this complexity and variability, the COMET system applies a model-based reasoning approach to the analysis of accounting systems and their controls.

CARMA: A Case-Based Rangeland Management Adviser

AI Magazine

CARMA is an advisory system for rangeland grasshopper infestations that demonstrates how AI technology can deliver expert advice to compensate for cutbacks in public services. CARMA uses two knowledge sources for the key task of predicting forage consumption by grasshoppers: (1) cases obtained by asking a group of experts to solve representative hypothetical problems and (2) a numeric model of rangeland ecosystems. These knowledge sources are integrated through the technique of model-based adaptation, in which case-based reasoning is used to find an approximate solution, and the model is used to adapt this approximate solution into a more precise solution. CARMA has been used in Wyoming counties since 1996. The combination of a simple interface, flexible control strategy, and integration of multiple knowledge sources makes CARMA accessible to inexperienced users and capable of producing advice comparable to that produced by human experts.

AI and Music: From Composition to Expressive Performance

AI Magazine

In this article, we first survey the three major types of computer music systems based on AI techniques: (1) compositional, (2) improvisational, and (3) performance systems. Representative examples of each type are briefly described. Then, we look in more detail at the problem of endowing the resulting performances with the expressiveness that characterizes human-generated music. This is one of the most challenging aspects of computer music that has been addressed just recently. The main problem in modeling expressiveness is to grasp the performer's "touch," that is, the knowledge applied when performing a score.

Playing with Cases: Rendering Expressive Music with Case-Based Reasoning

AI Magazine

Following a brief overview discussing why we prefer listening to expressive music instead of lifeless synthesized music, we examine a representative selection of well-known approaches to expressive computer music performance with an emphasis on AI-related approaches. In the main part of the paper we focus on the existing CBR approaches to the problem of synthesizing expressive music, and particularly on TempoExpress, a case-based reasoning system developed at our Institute, for applying musically acceptable tempo transformations to monophonic audio recordings of musical performances. Finally we briefly describe an ongoing extension of our previous work consisting on complementing audio information with information of the gestures of the musician. Music is played through our bodies, therefore capturing the gesture of the performer is a fundamental aspect that has to be taken into account in future expressive music renderings. This paper is based on the "2011 Robert S. Engelmore Memorial Lecture" given by the first author at AAAI/IAAI 2011.

Generalization through Memorization: Nearest Neighbor Language Models - Facebook Research


We introduce kNN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a k-nearest neighbors (kNN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong WIKITEXT-103 LM, with neighbors drawn from the original training set, our kNN-LM achieves a new state-of-the-art perplexity of 15.79 – a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge.

RHOG: A Refinement-Operator Library for Directed Labeled Graphs Artificial Intelligence

Intuitively, locally finiteness means that the refinement operator is computable, completeness means we can generate, by refinement of a, any element of G related to a given element g 1 by the order relation, and properness means that a refinement operator does not generate elements which are equivalent to the element being refined. When a refinement operator is locally finite, complete and proper, we say that it is ideal. Notice that all the subsumption relations presented above satisfy the reflexive 2 and transitive 3 properties. Therefore, the pair (G,), where G is the set of all DLGs given a set of labels L, and is any of the subsumption relations defined above is a quasi-ordered set. Thus, this opens the door to defining refinement operators for DLGs. Intuitively, a downward refinement operator for DLGs will generate refinements of a given DLG by either adding vertices, edges, or by making some of the labels more specific, thus making the graph more specific. In the following subsections, we will introduce a collection of refinement operators for connected DLGs, and discuss their theoretical properties. A summary of these operators is shown in Table 1, where we show that under the object-identity constraint, all the refinement operators presented in this document are ideal. If we do not impose object-identity, then the operators are locally complete and complete, but not proper.

Learning under Concept Drift: A Review Machine Learning

Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.

k-Nearest Neighbour Classifiers -- 2nd Edition Machine Learning

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.