Case-Based Reasoning


A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Neural Information Processing Systems

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.


The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

Neural Information Processing Systems

We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0,2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.


Constructing Ontology-Based Cancer Treatment Decision Support System with Case-Based Reasoning

arXiv.org Artificial Intelligence

Decision support is a probabilistic and quantitative method designed for modeling problems in situations with ambiguity. Computer technology can be employed to provide clinical decision support and treatment recommendations. The problem of natural language applications is that they lack formality and the interpretation is not consistent. Conversely, ontologies can capture the intended meaning and specify modeling primitives. Disease Ontology (DO) that pertains to cancer's clinical stages and their corresponding information components is utilized to improve the reasoning ability of a decision support system (DSS). The proposed DSS uses Case-Based Reasoning (CBR) to consider disease manifestations and provides physicians with treatment solutions from similar previous cases for reference. The proposed DSS supports natural language processing (NLP) queries. The DSS obtained 84.63% accuracy in disease classification with the help of the ontology.


Germany won't export arms to Saudi 'in current situation': Merkel

The Japan Times

BERLIN – German Chancellor Angela Merkel said Sunday that Berlin would not export arms to Saudi Arabia for now in the wake of dissident journalist Jamal Khashoggi's violent death. "I agree with all those who say when it comes to our already limited arms exports (to Saudi Arabia) that they cannot take place in the current situation," she told reporters at her party headquarters. Her foreign minister, Heiko Maas, had already said on Saturday that he currently saw "no basis for decisions in favor of arms exports to Saudi Arabia. Germany last month approved €416 million ($480 million) worth of arms exports to Saudi Arabia for 2018. In the past, military exports by Berlin to Riyadh have mostly consisted of patrol boats. Merkel reiterated that she condemned Khashoggi's killing "in the strongest terms" and saw an "urgent need to clear up" the case. "We are far from seeing everything on the table and the perpetrators being brought to justice," she said. Merkel added that she would continue to consult with international partners about a coordinated reaction to the case. Germany and Saudi Arabia only returned their ambassadors in September after 10 months of frosty relations following criticism from Berlin of what it said was Saudi interference in Lebanese affairs. The Khashoggi case has opened a serious new rift with European partners Britain, France and Germany saying in a joint statement earlier that Saudi Arabia must clarify how Khashoggi died inside its Istanbul consulate, and its account must "be backed by facts to be considered credible.


Jamal Khashoggi Is Dead. Here's What We Know.

NYT > Middle East

The bulk of evidence has come from leaks by Turkish authorities to pro-government news outlets. That information included the identification of 15 Saudis who Turkish officials said had been sent to assassinate Mr. Khashoggi and dispose of his body. Details in an audio recording from inside the consulate show that Saudi agents were waiting when Mr. Khashoggi walked into the building, and that he was dead within minutes, a senior Turkish official said. The audio revealed that Mr. Khashoggi was beheaded and dismembered, and his fingers severed, and that within two hours the killers were gone. The New York Times confirmed that at least nine of the 15 suspects identified by Turkish authorities worked for the Saudi security services, military or other government ministries.


A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

arXiv.org Machine Learning

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.


Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes

arXiv.org Machine Learning

The DEBS Grand Challenge 2018 is set in the context of maritime route prediction. Vessel routes are modeled as streams of Automatic Identification System (AIS) data points selected from real-world tracking data. The challenge requires to correctly estimate the destination ports and arrival times of vessel trips, as early as possible. Our proposed solution partitions the training vessel routes by reported destination port and uses a nearest neighbor search to find the training routes that are closer to the query AIS point. Particular improvements have been included as well, such as a way to avoid changing the predicted ports frequently within one query route and automating the parameters tuning by the use of a genetic algorithm. This leads to significant improvements on the final score.


Taking machine thinking out of the black box

#artificialintelligence

Software applications provide people with many kinds of automated decisions, such as identifying what an individual's credit risk is, informing a recruiter of which job candidate to hire, or determining whether someone is a threat to the public. In recent years, news headlines have warned of a future in which machines operate in the background of society, deciding the course of human lives while using untrustworthy logic. Part of this fear is derived from the obscure way in which many machine learning models operate. Known as black-box models, they are defined as systems in which the journey from input to output is next to impossible for even their developers to comprehend. "As machine learning becomes ubiquitous and is used for applications with more serious consequences, there's a need for people to understand how it's making predictions so they'll trust it when it's doing more than serving up an advertisement," says Jonathan Su, a member of the technical staff in MIT Lincoln Laboratory's Informatics and Decision Support Group.


Taking machine thinking out of the black box

MIT News

Software applications provide people with many kinds of automated decisions, such as identifying what an individual's credit risk is, informing a recruiter of which job candidate to hire, or determining whether someone is a threat to the public. In recent years, news headlines have warned of a future in which machines operate in the background of society, deciding the course of human lives while using untrustworthy logic. Part of this fear is derived from the obscure way in which many machine learning models operate. Known as black-box models, they are defined as systems in which the journey from input to output is next to impossible for even their developers to comprehend. "As machine learning becomes ubiquitous and is used for applications with more serious consequences, there's a need for people to understand how it's making predictions so they'll trust it when it's doing more than serving up an advertisement," says Jonathan Su, a member of the technical staff in MIT Lincoln Laboratory's Informatics and Decision Support Group.


Bayesian Patchworks: An Approach to Case-Based Reasoning

arXiv.org Machine Learning

Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience, almost every patient would have similarities to key cases seen in the past, and each new patient could be viewed as a mixture of these key past cases. Because doctors often tend to reason this way, an efficient computationally aided diagnostic tool that thinks in the same way might be helpful in locating key past cases of interest that could assist with diagnosis. This article develops a novel mathematical model to mimic the type of logical thinking that physicians use when considering past cases. The proposed model can also provide physicians with explanations that would be similar to the way they would naturally reason about cases. The proposed method is designed to yield predictive accuracy, computational efficiency, and insight into medical data; the key element is the insight into medical data, in some sense we are automating a complicated process that physicians might perform manually. We finally implemented the result of this work on two publicly available healthcare datasets, for heart disease prediction and breast cancer prediction.