Case-Based Reasoning
[P] New benchmarks for approximate nearest neighbors • r/MachineLearning
Then you compare a number of ANN algorithms on that on a number of data-sets (each comes with a pre-selected distance metric for the exhaustive nearest neighbour search, while the approximative algorithms may or may not use the same metric but are compared to it). I have indeed not seen many empirical tests comparing the approximate nearest neighbour to the actual precomputed nearest neighbour. What I have seen however are empirical tests where you use nearest neighbour search as a subroutine in some classification algorithm or other and where the classification performance results of the algorithm with (impractical) exhaustive search are tested against a variant with approximative search. Do you think testing only ANN against exhaustive 1NN is sufficient no matter what the techniques get used for? Apparently, you don't think just testing it through the outcome in the application scenario is sufficient, but maybe both then?
Product Quantized Translation for Fast Nearest Neighbor Search
Hwang, Yoonho (Pohang University of Science and Technology (POSTECH)) | Baek, Mooyeol (Pohang University of Science and Technology (POSTECH)) | Kim, Saehoon (Pohang University of Science and Technology (POSTECH)) | Han, Bohyung (Pohang University of Science and Technology (POSTECH)) | Ahn, Hee-Kap (Pohang University of Science and Technology (POSTECH))
This paper proposes a simple nearest neighbor search algorithm, which provides the exact solution in terms of the Euclidean distance efficiently. Especially, we present an interesting approach to improve the speed of nearest neighbor search by proper translations of data and query although the task is inherently invariant to the Euclidean transformations. The proposed algorithm aims to eliminate nearest neighbor candidates effectively using their distance lower bounds in nonlinear embedded spaces, and further improves the lower bounds by transforming data and query through product quantized translations. Although our framework is composed of simple operations only, it achieves the state-of-the-art performance compared to existing nearest neighbor search techniques, which is illustrated quantitatively using various large-scale benchmark datasets in different sizes and dimensions.
Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph
Xiao, Yan (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences ) | Guo, Jiafeng (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences ) | Lan, Yanyan (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences) | Xu, Jun (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences)
Approximate nearest neighbor search is a fundamental problem and has been studied for a few decades. Recently graph-based indexing methods have demonstrated their great efficiency, whose main idea is to construct neighborhood graph offline and perform a greedy search starting from some sampled points of the graph online. Most existing graph-based methods focus on either the precise k-nearest neighbor (k-NN) graph which has good exploitation ability, or the diverse graph which has good exploration ability. In this paper, we propose the k-diverse nearest neighbor (k-DNN) graph, which balances the precision and diversity of the graph, leading to good exploitation and exploration abilities simultaneously. We introduce an efficient indexing algorithm for the construction of the k-DNN graph inspired by a well-known diverse ranking algorithm in information retrieval (IR). Experimental results show that our method can outperform both state-of-the-art precise graph and diverse graph methods.
Doubly Approximate Nearest Neighbor Classification
Liu, Weiwei (The University of New South Wales) | Liu, Zhuanghua (University of Technology Sydney) | Tsang, Ivor W. (University of Technology Sydney) | Zhang, Wenjie (The University of New South Wales) | Lin, Xuemin (The University of New South Wales)
Nonparametric classification models, such as K-Nearest Neighbor (KNN), have become particularly powerful tools in machine learning and data mining, due to their simplicity and flexibility. However, the testing time of the KNN classifier becomes unacceptable and the KNN's performance deteriorates significantly when applied to data sets with millions of dimensions. We observe that state-of-the-art approximate nearest neighbor (ANN) methods aim to either reduce the number of distance comparisons based on tree structure or decrease the cost of distance computation by dimension reduction methods. In this paper, we propose a doubly approximate nearest neighbor classification strategy, which marries the two branches which compress the dimensions for decreasing distance computation cost as well as reduce the number of distance comparison instead of full scan. Under this strategy, we build a compressed dimensional tree (CD-Tree) to avoid unnecessary distance calculations. In each decision node, we propose a novel feature selection paradigm by optimizing the feature selection vector as well as the separator (indicator variables for splitting instances) with the maximum margin. An efficient algorithm is then developed to find the globally optimal solution with convergence guarantee. Furthermore, we also provide a data-dependent generalization error bound for our model, which reveals a new insight for the design of ANN classification algorithms. Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions.
Learning to Rank Based on Analogical Reasoning
Fahandar, Mohsen Ahmadi (Paderborn University) | Hüllermeier, Eyke (Paderborn University)
Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects A,B,C,D, if object A is known to be preferred to B, and C relates to D as A relates to B, then C is (supposedly) preferred to D. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.
Designing human-shaped artificial intelligence - CBR
Addressing human needs must be top priorities to design, develop and implement successful AI technology. With a rising amount of media attention and a sevenfold increase of investment, artificial intelligence (AI) is on the agenda of businesses in all industries, who are identifying the massive potential with this kind of technology. Design sits at the forefront of the drive to develop innovative everyday AI solutions that people find intuitive and easy to use. To pave the way for successful AI implementation, companies must consider creating experiences with AI that are less artificial and more intelligent, and most importantly, those that make AI more human-shaped. This is where design comes in. Designers possess the skills to create a world where everything is designed around real human needs.
Anatomical labeling of brain CT scan anomalies using multi-context nearest neighbor relation networks
Varadarajan, Srikrishna, Srivastava, Muktabh Mayank, Grewal, Monika, Kumar, Pulkit
This work is an endeavor to develop a deep learning methodology for automated anatomical labeling of a given region of interest (ROI) in brain computed tomography (CT) scans. We combine both local and global context to obtain a representation of the ROI. We then use Relation Networks (RNs) to predict the corresponding anatomy of the ROI based on its relationship score for each class. Further, we propose a novel strategy employing nearest neighbors approach for training RNs. We train RNs to learn the relationship of the target ROI with the joint representation of its nearest neighbors in each class instead of all data-points in each class. The proposed strategy leads to better training of RNs along with increased performance as compared to training baseline RN network.
Reports
The IJCAI-09 Workshop on Learning Structural Knowledge from Observations (STRUCK-09) took place as part of the International Joint Conference on Artificial Intelligence (IJCAI-09) on July 12 in Pasadena, California. The workshop program included paper presentations, discussion sessions about those papers, group discussions about two selected topics, and a joint discussion. As a result, many cognitive architectures use structural models to represent relations between knowledge of different complexity. Structural modeling has led to a number of representation and reasoning formalisms including frames, schemas, abstractions, hierarchical task networks (HTNs), and goal graphs among others. These formalisms have in common the use of certain kinds of constructs (for example, objects, goals, skills, and tasks) that represent knowledge of varying degrees of complexity and that are connected through structural relations.
The 1996 Simon Newcomb Award
His proofs are ingenious, cleverly argued, quite convincing to many of his contemporaries, and utterly wrong. The Simon Newcomb Award is given annually for the silliest published argument attacking AI. Our subject may be unique in the virulence and frequency with which it is attacked, both in the popular media and among the cultured intelligentsia. Recent articles have argued that the very idea of AI reflects a cancer in the heart of our culture and have proven (yet again) that it is impossible. While many of these attacks are cited widely, most of them are ridiculous to anyone with an appropriate technical education.
1058
Case-based reasoning (CBR) is becoming a viable real-world technology. First, it fragments each CBR system across many chapters, making it difficult to get the big picture of how the system works and obscuring the interrelatedness of the system's parts. In addition, having each chapter draw its examples from multiple systems adds a certain context-switching overhead: Each time a system is introduced (or reintroduced), the book must set the context anew, and the reader must recall the details of the system. A second drawback to the unified framework is that although it has fairly broad coverage, it is still biased toward those systems that fit it best. As a result, important work sometimes gets only a cursory mention in the book.