Goto

Collaborating Authors

 Case-Based Reasoning


Machine Learning Patentability in 2019: 5 Cases Analyzed and Lessons Learned Part 1 JD Supra

#artificialintelligence

Claims 1 and 8 as recited are not practically performed in the human mind. As discussed above, the claims recite monitoring operation of machines using neural networks, logic decision trees, confidence assessments, fuzzy logic, smart agent profiling, and case-based reasoning. . . .


10. Introduction to Learning, Nearest Neighbors

#artificialintelligence

Sign in to report inappropriate content. Instructor: Patrick Winston This lecture begins with a high-level view of learning, then covers nearest neighbors using several graphical examples. We then discuss how to learn motor skills such as bouncing a tennis ball, and consider the effects of sleep deprivation.


Learning similarity measures from data

arXiv.org Machine Learning

Progress in Artificial Intelligence manuscript No. (will be inserted by the editor) Abstract Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or set of cases most similar to the query case. Describing a similarity measure analytically is challenging, even for domain experts working with CBR experts. However, data sets are typically gathered as part of constructing a CBR or machine learning system. These datasets are assumed to contain the features that correctly identify the solution from the problem features, thus they may also contain the knowledge to construct or learn such a similarity measure. The main motivation for this work is to automate the construction of similarity measures using machine learning. Additionally, we would like to do this while keeping training time as low as possible. Working towards this, our objective is to investigate how to apply machine learning to effectively learn a similarity measure. Such a learned similarity measure could be used for CBR systems, but also for clustering data in semi-supervised learning, or one-shot learning tasks. Recent work has advanced towards this goal, relies on either very long training times or manually modeling parts of the similarity measure. We created a framework to help us analyze current methods for learning similarity measures. This analysis resulted in two novel similarity measure designs. Both similarity measures were evaluated on 14 different datasets. The evaluation shows that using a classifier as basis for a similarity measure gives state of the art performance. Finally the evaluation shows that our fully data-driven similarity measure design outperforms state of the art methods while keeping training time low. Keywords Similarity Measure, Data Science, Neural Networks, Data Analytics, Case-Based Reasoning, Similarity Function, Siamese Networks, Similarity metrics, Distance Metrics This work was supported by the Research Council of Norway through the EXPOSED project(grant number 302002390) and the Norwegian Open AI Lab 1 Introduction Many artificial intelligence and machine learning (ML) methods, such as k-nearest neighbors (k-NN) rely on a similarity (or distance) measure [21] between data points. In Case-based reasoning (CBR) a simple k-NN or a more complex similarity function is used to retrieve the stored cases that are most similar to the current query case.


From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)

arXiv.org Artificial Intelligence

This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been developing quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or the need for explanations and causal understanding. Then some methodologies combining reasoning and learning are reviewed (such as inductive logic programming, neuro-symbolic reasoning, formal concept analysis, rule-based representations and ML, uncertainty in ML, or case-based reasoning and analogical reasoning), before discussing examples of synergies between KRR and ML (including topics such as belief functions on regression, EM algorithm versus revision, the semantic description of vector representations, the combination of deep learning with high level inference, knowledge graph completion, declarative frameworks for data mining, or preferences and recommendation). This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.


Adaptive Nearest Neighbor: A General Framework for Distance Metric Learning

arXiv.org Machine Learning

$K$-NN classifier is one of the most famous classification algorithms, whose performance is crucially dependent on the distance metric. When we consider the distance metric as a parameter of $K$-NN, learning an appropriate distance metric for $K$-NN can be seen as minimizing the empirical risk of $K$-NN. In this paper, we design a new type of continuous decision function of the $K$-NN classification rule which can be used to construct the continuous empirical risk function of $K$-NN. By minimizing this continuous empirical risk function, we obtain a novel distance metric learning algorithm named as adaptive nearest neighbor (ANN). We have proved that the current algorithms such as the large margin nearest neighbor (LMNN), neighbourhood components analysis (NCA) and the pairwise constraint methods are special cases of the proposed ANN by setting the parameter different values. Compared with the LMNN, NCA, and pairwise constraint methods, our method has a broader searching space which may contain better solutions. At last, extensive experiments on various data sets are conducted to demonstrate the effectiveness and efficiency of the proposed method.


Finding the most similar textual documents using Case-Based Reasoning

arXiv.org Machine Learning

--In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into V ector Space Models (VSM) in order to apply AI algorithms over them. One such approach is based on Case-Based Reasoning (CBR). Therefore, the most important part of those systems is to compute the similarity between numerical data points. In 2016, the new similarity TS-SS metric is proposed, which showed state-of-the-art results in the field of textual mining for unsupervised learning. However, no one before has investigated its performances for supervised learning (classification task). In this work, we devised a CBR system capable of finding the most similar documents for a given query aiming to investigate performances of the new state-of- the-art metric, TS-SS, in addition to the two other geometrical similarity measures -- Euclidean distance and Cosine similarity -- that showed the best predictive results over several benchmark corpora. The results show surprising inappropriateness of TS-SS measure for high dimensional features.


r/MachineLearning - [Project] pgANN Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

#artificialintelligence

Hi, we did experiment with ES, using range queries on the vectors and boolean querying them and also tried using LSH/MinHash to save a signature for each vector. Did you have a different approach in mind? Also, you're correct about L-1 & L2 distances being poor metrics in this dimensionality, but our goal was to fetch a subset of (say) a few thousand "good enough" results - from a pool of a tens of millions - that can then be re-ranked with cosine or such metric. Unfortunately, there are no easy wins in ANN and this works well enough for us. We hope others can benefit as well.


AI-based Analytics: The key to business-led eDiscovery Casepoint

#artificialintelligence

Another common eDiscovery pitfall is the use of standard approaches for every case. Rather than dig in and discern data minimization and cost estimates for each case, many practitioners use generic formulas. Dubious tenets like "every stage of large cases goes to law firms" or "law firms always manage review for us" still rule the day. Teams automatically slap project planning formulas like 0 to 6 months for ECA, 6 to 12 months for full-blown eDiscovery and 12 to 24 months to finish eDiscovery, motions and trial preparations onto every eDiscovery project.


AI-based Analytics: The key to business-led eDiscovery Casepoint

#artificialintelligence

Another common eDiscovery pitfall is the use of standard approaches for every case. Rather than dig in and discern data minimization and cost estimates for each case, many practitioners use generic formulas. Dubious tenets like "every stage of large cases goes to law firms" or "law firms always manage review for us" still rule the day. Teams automatically slap project planning formulas like 0 to 6 months for ECA, 6 to 12 months for full-blown eDiscovery and 12 to 24 months to finish eDiscovery, motions and trial preparations onto every eDiscovery project.


Supervised Learning Approach to Approximate Nearest Neighbor Search

arXiv.org Machine Learning

Approximate nearest neighbor search is a classic algorithmic problem where the goal is to design an efficient index structure for fast approximate nearest neighbor queries. We show that it can be framed as a classification problem and solved by training a suitable multi-label classifier and using it as an index. Compared to the existing algorithms, this supervised learning approach has several advantages: it enables adapting an index to the query distribution when the query distribution and the corpus distribution differ; it allows using training sets larger than the corpus; and in principle it enables using any multi-label classifier for approximate nearest neighbor search. We demonstrate these advantages on multiple synthetic and real-world data sets by using a random forest and an ensemble of random projection trees as the base classifiers. Introduction In k -nearest neighbor ( k -nn) search, k points that are nearest to the query point are retrieved from the corpus. Approximate nearest neighbor search is used to speed up k -nn search in applications where fast response times are critical, such as in computer vision, robotics, and recommendation systems. Traditionally, approximate nearest neighbor search is approached as a problem in algorithms and data structures. Space-partitioning methods--trees, hashing, and quantization--divide the space according to a geometric criterion. For instance, k -d trees (Bentley 1975) and principal component trees (McNames 2001) are grown by hierarchically partitioning the space along the maximum variance directions of the corpus.