AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

RIPML: A Restricted Isometry Property-Based Approach to Multilabel Learning

Soni, Akshay (Yahoo! Research) | Mehdad, Yashar (Airbnb)

AAAI ConferencesMay-16-2017

The multilabel learning problem with large number of labels, features, and data-points has generated a tremendous interest recently. A recurring theme of these problems is that only a few labels are active in any given data point as compared to the total number of labels. However, only a small number of existing work take direct advantage of this inherent extreme sparsity in the label space. By the virtue of Restricted Isometry Property (RIP), satisfied by many random ensembles, we propose a novel procedure for multilabel learning known as RIPML. During the training phase, in RIPML, labels are projected onto a random low-dimensional subspace followed by solving a least-square problem in this subspace. Inference is done by a k-nearest neighbor (kNN) based approach. We demonstrate the effectiveness of RIPML by conducting extensive simulations and comparing results with the state-of-the-art linear dimensionality reduction based approaches.

multilabel learning, restricted isometry property-based approach, ripml

AAAI Conferences

The Thirtieth International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Add feedback

Extreme Gradient Boosting and Preprocessing in Machine Learning – Addendum to predicting flu outcome with R

#artificialintelligenceMay-11-2017, 10:00:14 GMT

In last week's post I explored whether machine learning models can be applied to predict flu deaths from the 2013 outbreak of influenza A H7N9 in China. There, I compared random forests, elastic-net regularized generalized linear models, k-nearest neighbors, penalized discriminant analysis, stabilized linear discriminant analysis, nearest shrunken centroids, single C5.0 tree and partial least squares. Extreme gradient boosting (XGBoost) is a faster and improved implementation of gradient boosting for supervised learning and has recently been very successfully applied in Kaggle competitions. Because I've heard XGBoost's praise being sung everywhere lately, I wanted to get my feet wet with it too. So this week I want to compare the prediction success of gradient boosting with the same dataset.

artificial intelligence, gradient, machine learning, (17 more...)

#artificialintelligence

Country: Asia > China (0.25)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

K-Nearest Neighbor classification using python

@machinelearnbotMay-10-2017, 14:35:08 GMT

A number of open-source communities are using python to make available artificial intelligence and machine learning related packages and libraries. In this blog I will use libraries from scikit-learn. Project scikit-learn is a Machine Learning Project in Python. It has a good collection of algorithms for some of the well known data-mining and data analysis jobs such as for Classification, Regression, Clustering, Dimensionality reduction and Model Selection. These algorithms are constructed on a stack of NumPy, SciPy library, and matplotlib.

algorithm, artificial intelligence, machine learning, (17 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Handling imbalanced dataset in supervised learning using family of SMOTE algorithm.

#artificialintelligenceApr-26-2017, 02:45:02 GMT

The algorithm adaptively updates the distribution and there are no assumptions made for the underlying distribution of the data. The algorithm uses Euclidean distance for KNN Algorithm. The key difference between ADASYN and SMOTE is that the former uses a density distribution, as a criterion to automatically decide the number of synthetic samples that must be generated for each minority sample by adaptively changing the weights of the different minority samples to compensate for the skewed distributions. The latter generates the same number of synthetic samples for each original minority sample.

artificial intelligence, handling imbalanced dataset, machine learning, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.31)

Add feedback

Faiss: A library for efficient similarity search

#artificialintelligenceApr-23-2017, 07:00:01 GMT

This month, we released Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other -- a challenge where traditional query search engines fall short. We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature. This lets us break some records, including the first k-nearest-neighbor graph constructed on 1 billion high-dimensional vectors. Traditional databases are made up of structured tables containing symbolic information. For example, an image collection would be represented as a table with one row per indexed photo.

information retrieval, machine learning, natural language, (21 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

A to Z of Analytics

@machinelearnbotApr-18-2017, 17:25:53 GMT

Artificial Intelligence:: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity. Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together Crisp-DM: is the cross industry standard process for data mining. It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models.

artificial intelligence, data mining, machine learning, (19 more...)

@machinelearnbot

Industry:

Automobiles & Trucks (0.56)
Information Technology (0.36)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.36)
(2 more...)

Add feedback

Adaptive Neighboring Selection Algorithm Based on Curvature Prediction in Manifold Learning

Ma, Lin, Zhou, Caifa, Liu, Xi, Xu, Yubin

arXiv.org Machine LearningApr-13-2017

Recently manifold learning algorithm for dimensionality reduction attracts more and more interests, and various linear and nonlinear, global and local algorithms are proposed. The key step of manifold learning algorithm is the neighboring region selection. However, so far for the references we know, few of which propose a generally accepted algorithm to well select the neighboring region. So in this paper, we propose an adaptive neighboring selection algorithm, which successfully applies the LLE and ISOMAP algorithms in the test. It is an algorithm that can find the optimal K nearest neighbors of the data points on the manifold. And the theoretical basis of the algorithm is the approximated curvature of the data point on the manifold. Based on Riemann Geometry, Jacob matrix is a proper mathematical concept to predict the approximated curvature. By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results. By quantitative analysis, the embedding quality measured by residual variance is increased 45.45% after using the proposed algorithm in LLE.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1704.0405

Country: Asia > China (0.51)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

Machine Learning in R for beginners

#artificialintelligenceApr-12-2017, 08:40:28 GMT

Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or "predictive modeling", clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. Machine learning hopes that including the experience into its tasks will eventually improve the learning. The ultimate goal is to improve the learning in such a way that it becomes automatic, so that humans like ourselves don't need to interfere any more. This small tutorial is meant to introduce you to the basics of machine learning in R: more specifically, it will show you how to use R to work with the well-known machine learning algorithm called "KNN" or k-nearest neighbors. Additionally, this tutorial also covers how to use caret do to machine learning in R. If you're interested in following a course, consider checking out our Introduction to Machine Learning with R or DataCamp's Unsupervised Learning in R course! The KNN or k-nearest neighbors algorithm is one of the simplest machine learning algorithms and is an example of instance-based learning, where new data are classified based on stored, labeled instances.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

K-Nearest Neighbors on Wisconsin Breast Cancer Data

#artificialintelligenceApr-9-2017, 04:33:11 GMT

First, let's set things up in R by loading the necessary package and importing the data into R. the class package will be used to run the k-nearest neighbors algorithm. We will also use a specific seed so that you can reproduce this in R yourself. Importing the data, let's take a look at the basic structure of the dataset. The first variable, id, is there simply as a unique identifier for each observation. We will take it out for the purposes of our analysis.

artificial intelligence, machine learning, wisconsin breast cancer data, (6 more...)

#artificialintelligence

Country: North America > United States > Wisconsin (0.40)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Li, Ke, Malik, Jitendra

arXiv.org Artificial IntelligenceApr-6-2017

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Artificial Intelligence

1512.00442

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.90)

Add feedback