# Case-Based Reasoning

### The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0,2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.

### A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.

### The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0,2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.

### Constructing Ontology-Based Cancer Treatment Decision Support System with Case-Based Reasoning

Decision support is a probabilistic and quantitative method designed for modeling problems in situations with ambiguity. Computer technology can be employed to provide clinical decision support and treatment recommendations. The problem of natural language applications is that they lack formality and the interpretation is not consistent. Conversely, ontologies can capture the intended meaning and specify modeling primitives. Disease Ontology (DO) that pertains to cancer's clinical stages and their corresponding information components is utilized to improve the reasoning ability of a decision support system (DSS). The proposed DSS uses Case-Based Reasoning (CBR) to consider disease manifestations and provides physicians with treatment solutions from similar previous cases for reference. The proposed DSS supports natural language processing (NLP) queries. The DSS obtained 84.63% accuracy in disease classification with the help of the ontology.

### Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes

The DEBS Grand Challenge 2018 is set in the context of maritime route prediction. Vessel routes are modeled as streams of Automatic Identification System (AIS) data points selected from real-world tracking data. The challenge requires to correctly estimate the destination ports and arrival times of vessel trips, as early as possible. Our proposed solution partitions the training vessel routes by reported destination port and uses a nearest neighbor search to find the training routes that are closer to the query AIS point. Particular improvements have been included as well, such as a way to avoid changing the predicted ports frequently within one query route and automating the parameters tuning by the use of a genetic algorithm. This leads to significant improvements on the final score.

### Taking machine thinking out of the black box

Software applications provide people with many kinds of automated decisions, such as identifying what an individual's credit risk is, informing a recruiter of which job candidate to hire, or determining whether someone is a threat to the public. In recent years, news headlines have warned of a future in which machines operate in the background of society, deciding the course of human lives while using untrustworthy logic. Part of this fear is derived from the obscure way in which many machine learning models operate. Known as black-box models, they are defined as systems in which the journey from input to output is next to impossible for even their developers to comprehend. "As machine learning becomes ubiquitous and is used for applications with more serious consequences, there's a need for people to understand how it's making predictions so they'll trust it when it's doing more than serving up an advertisement," says Jonathan Su, a member of the technical staff in MIT Lincoln Laboratory's Informatics and Decision Support Group.

### Bayesian Patchworks: An Approach to Case-Based Reasoning

Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience, almost every patient would have similarities to key cases seen in the past, and each new patient could be viewed as a mixture of these key past cases. Because doctors often tend to reason this way, an efficient computationally aided diagnostic tool that thinks in the same way might be helpful in locating key past cases of interest that could assist with diagnosis. This article develops a novel mathematical model to mimic the type of logical thinking that physicians use when considering past cases. The proposed model can also provide physicians with explanations that would be similar to the way they would naturally reason about cases. The proposed method is designed to yield predictive accuracy, computational efficiency, and insight into medical data; the key element is the insight into medical data, in some sense we are automating a complicated process that physicians might perform manually. We finally implemented the result of this work on two publicly available healthcare datasets, for heart disease prediction and breast cancer prediction.