AITopics

Country: North America > United States (0.34)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.55)

#artificialintelligenceJan-20-2020, 08:22:58 GMT

10. Introduction to Learning, Nearest Neighbors

Sign in to report inappropriate content. Instructor: Patrick Winston This lecture begins with a high-level view of learning, then covers nearest neighbors using several graphical examples. We then discuss how to learn motor skills such as bouncing a tennis ball, and consider the effects of sleep deprivation.

artificial intelligence, nearest neighbor, social media, (3 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.38)

Industry: Leisure & Entertainment > Sports > Tennis (0.76)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.71)

Mathisen, Bjørn Magnus, Aamodt, Agnar, Bach, Kerstin, Langseth, Helge

Learning similarity measures from data

arXiv.org Machine LearningJan-15-2020

Progress in Artificial Intelligence manuscript No. (will be inserted by the editor) Abstract Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or set of cases most similar to the query case. Describing a similarity measure analytically is challenging, even for domain experts working with CBR experts. However, data sets are typically gathered as part of constructing a CBR or machine learning system. These datasets are assumed to contain the features that correctly identify the solution from the problem features, thus they may also contain the knowledge to construct or learn such a similarity measure. The main motivation for this work is to automate the construction of similarity measures using machine learning. Additionally, we would like to do this while keeping training time as low as possible. Working towards this, our objective is to investigate how to apply machine learning to effectively learn a similarity measure. Such a learned similarity measure could be used for CBR systems, but also for clustering data in semi-supervised learning, or one-shot learning tasks. Recent work has advanced towards this goal, relies on either very long training times or manually modeling parts of the similarity measure. We created a framework to help us analyze current methods for learning similarity measures. This analysis resulted in two novel similarity measure designs. Both similarity measures were evaluated on 14 different datasets. The evaluation shows that using a classifier as basis for a similarity measure gives state of the art performance. Finally the evaluation shows that our fully data-driven similarity measure design outperforms state of the art methods while keeping training time low. Keywords Similarity Measure, Data Science, Neural Networks, Data Analytics, Case-Based Reasoning, Similarity Function, Siamese Networks, Similarity metrics, Distance Metrics This work was supported by the Research Council of Norway through the EXPOSED project(grant number 302002390) and the Norwegian Open AI Lab 1 Introduction Many artificial intelligence and machine learning (ML) methods, such as k-nearest neighbors (k-NN) rely on a similarity (or distance) measure [21] between data points. In Case-based reasoning (CBR) a simple k-NN or a more complex similarity function is used to retrieve the stored cases that are most similar to the current query case.

dataset, neural network, similarity measure, (15 more...)

doi: 10.1007/s13748-019-00201-2

2001.05312

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Bouraoui, Zied, Cornuéjols, Antoine, Denœux, Thierry, Destercke, Sébastien, Dubois, Didier, Guillaume, Romain, Marques-Silva, João, Mengin, Jérôme, Prade, Henri, Schockaert, Steven, Serrurier, Mathieu, Vrain, Christel

From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)

arXiv.org Artificial IntelligenceDec-13-2019

This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been developing quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or the need for explanations and causal understanding. Then some methodologies combining reasoning and learning are reviewed (such as inductive logic programming, neuro-symbolic reasoning, formal concept analysis, rule-based representations and ML, uncertainty in ML, or case-based reasoning and analogical reasoning), before discussing examples of synergies between KRR and ML (including topics such as belief functions on regression, EM algorithm versus revision, the semantic description of vector representations, the combination of deep learning with high level inference, knowledge graph completion, declarative frameworks for data mining, or preferences and recommendation). This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.

artificial intelligence, neural network, representation, (11 more...)

arXiv.org Artificial Intelligence

1912.06612

Country:

North America > United States > California > San Francisco County > San Francisco (0.27)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(53 more...)

Genre:

Overview (1.00)
Research Report (0.64)

Industry:

Education (0.67)
Information Technology (0.45)
Transportation > Ground (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(6 more...)

arXiv.org Machine LearningNov-22-2019

Adaptive Nearest Neighbor: A General Framework for Distance Metric Learning

Song, Kun

$K$-NN classifier is one of the most famous classification algorithms, whose performance is crucially dependent on the distance metric. When we consider the distance metric as a parameter of $K$-NN, learning an appropriate distance metric for $K$-NN can be seen as minimizing the empirical risk of $K$-NN. In this paper, we design a new type of continuous decision function of the $K$-NN classification rule which can be used to construct the continuous empirical risk function of $K$-NN. By minimizing this continuous empirical risk function, we obtain a novel distance metric learning algorithm named as adaptive nearest neighbor (ANN). We have proved that the current algorithms such as the large margin nearest neighbor (LMNN), neighbourhood components analysis (NCA) and the pairwise constraint methods are special cases of the proposed ANN by setting the parameter different values. Compared with the LMNN, NCA, and pairwise constraint methods, our method has a broader searching space which may contain better solutions. At last, extensive experiments on various data sets are conducted to demonstrate the effectiveness and efficiency of the proposed method.

metric learning, nearest neighbor, neighbor, (15 more...)

1911.10674

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.83)

Mihajlovic, Marko, Xiong, Ning

Finding the most similar textual documents using Case-Based Reasoning

arXiv.org Machine LearningNov-1-2019

--In recent years, huge amounts of unstructured textual data on the Internet are a big difficulty for AI algorithms to provide the best recommendations for users and their search queries. Since the Internet became widespread, a lot of research has been done in the field of Natural Language Processing (NLP) and machine learning. Almost every solution transforms documents into V ector Space Models (VSM) in order to apply AI algorithms over them. One such approach is based on Case-Based Reasoning (CBR). Therefore, the most important part of those systems is to compute the similarity between numerical data points. In 2016, the new similarity TS-SS metric is proposed, which showed state-of-the-art results in the field of textual mining for unsupervised learning. However, no one before has investigated its performances for supervised learning (classification task). In this work, we devised a CBR system capable of finding the most similar documents for a given query aiming to investigate performances of the new state-of- the-art metric, TS-SS, in addition to the two other geometrical similarity measures -- Euclidean distance and Cosine similarity -- that showed the best predictive results over several benchmark corpora. The results show surprising inappropriateness of TS-SS measure for high dimensional features.

dataset, feature vector, similarity metric, (13 more...)

1911.00262

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)

#artificialintelligenceOct-22-2019, 23:07:45 GMT

r/MachineLearning - [Project] pgANN Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

Hi, we did experiment with ES, using range queries on the vectors and boolean querying them and also tried using LSH/MinHash to save a signature for each vector. Did you have a different approach in mind? Also, you're correct about L-1 & L2 distances being poor metrics in this dimensionality, but our goal was to fetch a subset of (say) a few thousand "good enough" results - from a pool of a tens of millions - that can then be re-ranked with cosine or such metric. Unfortunately, there are no easy wins in ANN and this works well enough for us. We hope others can benefit as well.

machinelearning, pgann fast approximate nearest neighbor, postgresql database, (1 more...)

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.40)

#artificialintelligenceOct-22-2019, 19:11:06 GMT

AI-based Analytics: The key to business-led eDiscovery Casepoint

Another common eDiscovery pitfall is the use of standard approaches for every case. Rather than dig in and discern data minimization and cost estimates for each case, many practitioners use generic formulas. Dubious tenets like "every stage of large cases goes to law firms" or "law firms always manage review for us" still rule the day. Teams automatically slap project planning formulas like 0 to 6 months for ECA, 6 to 12 months for full-blown eDiscovery and 12 to 24 months to finish eDiscovery, motions and trial preparations onto every eDiscovery project.

ai-based analytic, business-led ediscovery casepoint, formula

Industry: Law > Litigation (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.40)

#artificialintelligenceOct-22-2019, 19:11:05 GMT

AI-based Analytics: The key to business-led eDiscovery Casepoint

ai-based analytic, business-led ediscovery casepoint, formula

Industry: Law > Litigation (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.40)

Hyvönen, Ville, Jääsaari, Elias, Roos, Teemu

Supervised Learning Approach to Approximate Nearest Neighbor Search

arXiv.org Machine LearningOct-18-2019

Approximate nearest neighbor search is a classic algorithmic problem where the goal is to design an efficient index structure for fast approximate nearest neighbor queries. We show that it can be framed as a classification problem and solved by training a suitable multi-label classifier and using it as an index. Compared to the existing algorithms, this supervised learning approach has several advantages: it enables adapting an index to the query distribution when the query distribution and the corpus distribution differ; it allows using training sets larger than the corpus; and in principle it enables using any multi-label classifier for approximate nearest neighbor search. We demonstrate these advantages on multiple synthetic and real-world data sets by using a random forest and an ensemble of random projection trees as the base classifiers. Introduction In k -nearest neighbor ( k -nn) search, k points that are nearest to the query point are retrieved from the corpus. Approximate nearest neighbor search is used to speed up k -nn search in applications where fast response times are critical, such as in computer vision, robotics, and recommendation systems. Traditionally, approximate nearest neighbor search is approached as a problem in algorithms and data structures. Space-partitioning methods--trees, hashing, and quantization--divide the space according to a geometric criterion. For instance, k -d trees (Bentley 1975) and principal component trees (McNames 2001) are grown by hierarchically partitioning the space along the maximum variance directions of the corpus.

algorithm, nearest neighbor, nearest neighbor search, (12 more...)

1910.08322

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.90)