Goto

Collaborating Authors

 Case-Based Reasoning


Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

arXiv.org Machine Learning

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.


Constructing Ontology-Based Cancer Treatment Decision Support System with Case-Based Reasoning

arXiv.org Artificial Intelligence

Decision support is a probabilistic and quantitative method designed for modeling problems in situations with ambiguity. Computer technology can be employed to provide clinical decision support and treatment recommendations. The problem of natural language applications is that they lack formality and the interpretation is not consistent. Conversely, ontologies can capture the intended meaning and specify modeling primitives. Disease Ontology (DO) that pertains to cancer's clinical stages and their corresponding information components is utilized to improve the reasoning ability of a decision support system (DSS). The proposed DSS uses Case-Based Reasoning (CBR) to consider disease manifestations and provides physicians with treatment solutions from similar previous cases for reference. The proposed DSS supports natural language processing (NLP) queries. The DSS obtained 84.63% accuracy in disease classification with the help of the ontology.


A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

arXiv.org Machine Learning

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.


Predicting Destinations by Nearest Neighbor Search on Training Vessel Routes

arXiv.org Machine Learning

The DEBS Grand Challenge 2018 is set in the context of maritime route prediction. Vessel routes are modeled as streams of Automatic Identification System (AIS) data points selected from real-world tracking data. The challenge requires to correctly estimate the destination ports and arrival times of vessel trips, as early as possible. Our proposed solution partitions the training vessel routes by reported destination port and uses a nearest neighbor search to find the training routes that are closer to the query AIS point. Particular improvements have been included as well, such as a way to avoid changing the predicted ports frequently within one query route and automating the parameters tuning by the use of a genetic algorithm. This leads to significant improvements on the final score.


Bayesian Patchworks: An Approach to Case-Based Reasoning

arXiv.org Machine Learning

Doctors often rely on their past experience in order to diagnose patients. For a doctor with enough experience, almost every patient would have similarities to key cases seen in the past, and each new patient could be viewed as a mixture of these key past cases. Because doctors often tend to reason this way, an efficient computationally aided diagnostic tool that thinks in the same way might be helpful in locating key past cases of interest that could assist with diagnosis. This article develops a novel mathematical model to mimic the type of logical thinking that physicians use when considering past cases. The proposed model can also provide physicians with explanations that would be similar to the way they would naturally reason about cases. The proposed method is designed to yield predictive accuracy, computational efficiency, and insight into medical data; the key element is the insight into medical data - in some sense we are automating a complicated process that physicians might perform manually. We finally implemented the result of this work on two publicly available healthcare datasets, for (1) heart disease prediction and (2) breast cancer prediction.


Net neutrality activists, state officials are taking the FCC to court. Here's how they'll argue the case.

Washington Post - Technology News

Opponents of the Federal Communications Commission have outlined their chief arguments on net neutrality to a federal appeals court in Washington, in hopes of undoing the FCC's move last year to repeal its own rules for Internet service providers. The legal briefs reflect a widening front in the multipronged campaign by consumer groups and tech companies to rescue the ISP regulations, which originally barred providers from blocking websites or slowing them. With the FCC's changes, Internet providers may legally manipulate Internet traffic as it travels over their infrastructure, as long as they disclose their practices to consumers. The FCC's decision last year to repeal the rules was "arbitrary and capricious," said officials from the state of New York, the California Public Utilities Commission and others in court documents Monday -- asking the U.S. Court of Appeals for the District of Columbia Circuit to overrule the agency. The FCC was too credulous in accepting industry promises "to refrain from harmful practices," the officials said, "notwithstanding substantial record evidence showing that [Internet] providers have abused and will abuse their gatekeeper roles in ways that harm consumers and threaten public safety."


How Complex is your classification problem? A survey on measuring classification complexity

arXiv.org Machine Learning

Extracting characteristics from the training datasets of classification problems has proven effective in a number of meta-analyses. Among them, measures of classification complexity can estimate the difficulty in separating the data points into their expected classes. Descriptors of the spatial distribution of the data and estimates of the shape and size of the decision boundary are among the existent measures for this characterization. This information can support the formulation of new data-driven pre-processing and pattern recognition techniques, which can in turn be focused on challenging characteristics of the problems. This paper surveys and analyzes measures which can be extracted from the training datasets in order to characterize the complexity of the respective classification problems. Their use in recent literature is also reviewed and discussed, allowing to prospect opportunities for future work in the area. Finally, descriptions are given on an R package named Extended Complexity Library (ECoL) that implements a set of complexity measures and is made publicly available.


An Efficient Approach to Learning Chinese Judgment Document Similarity Based on Knowledge Summarization

arXiv.org Artificial Intelligence

A previous similar case in common law systems can be used as a reference with respect to the current case such that identical situations can be treated similarly in every case. However, current approaches for judgment document similarity computation failed to capture the core semantics of judgment documents and therefore suffer from lower accuracy and higher computation complexity. In this paper, a knowledge block summarization based machine learning approach is proposed to compute the semantic similarity of Chinese judgment documents. By utilizing domain ontologies for judgment documents, the core semantics of Chinese judgment documents is summarized based on knowledge blocks. Then the WMD algorithm is used to calculate the similarity between knowledge blocks. At last, the related experiments were made to illustrate that our approach is very effective and efficient in achieving higher accuracy and faster computation speed in comparison with the traditional approaches.


Harvey Weinstein seeks to dismiss case based on accuser's emails

BBC News

Hollywood producer Harvey Weinstein is seeking to get the criminal case against him thrown out of court. On Friday, his lawyers filed a defence motion citing dozens of "warm" emails they say Mr Weinstein received from one of his accusers after an alleged rape. His team argue prosecutors should have shared the evidence with the Grand Jury that indicted him. Mr Weinstein has pleaded not guilty to six charges involving three different women. The accuser in question has retained her anonymity.


10 Ways To Improve Cloud ERP With AI & Machine Learning

Forbes - Tech

Capitalizing on new digital business models and the growth opportunities they provide are forcing companies to re-evaluate ERP's role. Made inflexible by years of customization, legacy ERP systems aren't delivering what digital business models need today to scale and grow. Legacy ERP systems were purpose-built to excel at production consistency first at the expense of flexibility and responsiveness to customers' changing requirements. By taking a business case-based approach to integrating Artificial Intelligence (AI) and machine learning into their platforms, Cloud ERP providers can fill the gap legacy ERP systems can't. Companies need to be able to respond quickly to unexpected, unfamiliar and unforeseen dilemmas with smart decisions fast for new digital business models to succeed.