AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Developing and Improving Risk Models using Machine-learning Based Algorithms

arXiv.org Machine LearningSep-9-2020

The objective of this study is to develop a good risk model for classifying business delinquency by simultaneously exploring several machine learning based methods including regularization, hyper-parameter optimization, and model ensembling algorithms. The rationale under the analyses is firstly to obtain good base binary classifiers (include Logistic Regression ($LR$), K-Nearest Neighbors ($KNN$), Decision Tree ($DT$), and Artificial Neural Networks ($ANN$)) via regularization and appropriate settings of hyper-parameters. Then two model ensembling algorithms including bagging and boosting are performed on the good base classifiers for further model improvement. The models are evaluated using accuracy, Area Under the Receiver Operating Characteristic Curve (AUC of ROC), recall, and F1 score via repeating 10-fold cross-validation 10 times. The results show the optimal base classifiers along with the hyper-parameter settings are $LR$ without regularization, $KNN$ by using 9 nearest neighbors, $DT$ by setting the maximum level of the tree to be 7, and $ANN$ with three hidden layers. Bagging on $KNN$ with $K$ valued 9 is the optimal model we can get for risk classification as it reaches the average accuracy, AUC, recall, and F1 score valued 0.90, 0.93, 0.82, and 0.89, respectively.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

doi: 10.1145/3299815.3314478

2009.04559

Country: North America > United States (0.33)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

Travel time prediction for congested freeways with a dynamic linear model

Kwak, Semin, Geroliminis, Nikolas

arXiv.org Machine LearningSep-2-2020

Accurate prediction of travel time is an essential feature to support Intelligent Transportation Systems (ITS). The non-linearity of traffic states, however, makes this prediction a challenging task. Here we propose to use dynamic linear models (DLMs) to approximate the non-linear traffic states. Unlike a static linear regression model, the DLMs assume that their parameters are changing across time. We design a DLM with model parameters defined at each time unit to describe the spatio-temporal characteristics of time-series traffic data. Based on our DLM and its model parameters analytically trained using historical data, we suggest an optimal linear predictor in the minimum mean square error (MMSE) sense. We compare our prediction accuracy of travel time for freeways in California (I210-E and I5-S) under highly congested traffic conditions with those of other methods: the instantaneous travel time, k-nearest neighbor, support vector regression, and artificial neural network. We show significant improvements in the accuracy, especially for short-term prediction.

artificial intelligence, machine learning, travel time, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TITS.2020.3006910

2009.01016

Country:

North America > United States > California (0.24)
Europe > Switzerland (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.50)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Learn Data Science, Deep Learning, Machine Learning, NLP & R

#artificialintelligenceSep-1-2020, 16:46:15 GMT

Get Udemy Coupon Code New What you'll learn Mathematics and Statistics behind Machine Learning Mathematics and Statistics behind Deep Learning Mathematics and Statistics behind Artificial Intelligence Python Programming Language from Scratch Python with it's Libraries Learn Numpy, Pandas, Matplotlib, Scikit-Learn Learn Natural Language Processing Learn Tokenization in Natural Language Processing Learn Implementation of R Packages and Libraries on Different Data Sets Learn Implementation of Python Libraries on Different Data Sets Algorithms and Models of Machine Learning Algorithms and Models of Deep Learning k-Nearest Neighbors, Naive Bayes etc Supervised and Unsupervised Learning Description DATA SCIENCE Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals. Today, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, data mining, and programming skills. What Does a Data Scientist Do? In the past decade, data scientists have become necessary assets and are present in almost all organizations. These professionals are well-rounded, data-driven individuals with high-level technical skills who are capable of building complex quantitative algorithms to organize and synthesize large amounts of information used to answer questions and drive strategy in their organization.

deep learning, mathematics and statistics, neural network, (17 more...)

#artificialintelligence

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.62)

Add feedback

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

Liu, Genggeng, Guo, Canyang, Xie, Lin, Liu, Wenxi, Xiong, Naixue, Chen, Guolong

arXiv.org Machine LearningAug-28-2020

In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be processed by computer without losing the original semantic information. However, these methods are difficult to effectively extract the semantic features among words and distinguish polysemy in language. Therefore, a text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed to extract the text features and apply the obtained text feature representation on the text classification tasks. CNN is used to extract the features of text vector to get the semantics among words and VAE is introduced to make the text feature space more consistent with Gaussian distribution. In addition, the output of the improved word2vec model is employed as the input of the proposed model to distinguish different meanings of the same word in different contexts. The experimental results show that the proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.

deep learning, neural network, vector, (21 more...)

arXiv.org Machine Learning

2008.12522

Country: Asia > China (0.71)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Neural Neighborhood Encoding for Classification

Sinha, Kaushik, Ram, Parikshit

arXiv.org Machine LearningAug-19-2020

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgupta et al., 2018] is able to efficiently summarize the data with a single pass and has been used for novelty detection. We propose a new classifier (for binary and multi-class classification) that effectively encodes the different local neighborhoods for each class with a per-class Fly Bloom Filter. The inference on test data requires an efficient {\tt FlyHash} [Dasgupta, et al., 2017] operation followed by a high-dimensional, but {\em sparse}, dot product with the per-class Bloom Filters. The learning is trivially parallelizable. On the theoretical side, we establish conditions under which the prediction of our proposed classifier on any test example agrees with the prediction of the nearest neighbor classifier with high probability. We extensively evaluate our proposed scheme with over $50$ data sets of varied data dimensionality to demonstrate that the predictive performance of our proposed neuroscience inspired classifier is competitive the the nearest-neighbor classifiers and other single-pass classifiers.

artificial intelligence, fbfc, neural network, (20 more...)

arXiv.org Machine Learning

2008.08685

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

A Formally Robust Time Series Distance Metric

Toller, Maximilian, Geiger, Bernhard C., Kern, Roman

arXiv.org Machine LearningAug-18-2020

Distance-based classification is among the most competitive classification methods for time series data. The most critical component of distance-based classification is the selected distance function. Past research has proposed various different distance metrics or measures dedicated to particular aspects of real-world time series data, yet there is an important aspect that has not been considered so far: Robustness against arbitrary data contamination. In this work, we propose a novel distance metric that is robust against arbitrarily "bad" contamination and has a worst-case computational complexity of $\mathcal{O}(n\log n)$. We formally argue why our proposed metric is robust, and demonstrate in an empirical evaluation that the metric yields competitive classification accuracy when applied in k-Nearest Neighbor time series classification.

artificial intelligence, distance function, machine learning, (17 more...)

arXiv.org Machine Learning

2008.07865

Country:

Europe (1.00)
North America > United States > Alaska (0.16)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Machine Learning Algorithms

#artificialintelligenceAug-8-2020, 00:05:11 GMT

Arthur Samuel (1959): "Field of study that gives computers the ability to learn without being explicitly programmed". Tom Mitchel (1997): "A computer program is said to learn if its performance at a task T, as measured by a performance P, improves with experience E". Selecting a right machine-learning algorithm depends on several factors, including the data size, quality and nature of data. Choosing the right algorithm is both a combination of business need, specification, experimentation and time available. Here we will explore different machine learning algorithms. In supervised learning, we provide a known dataset that includes inputs and desired outputs.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.31)

Add feedback

K-Nearest Neighbors Algorithm

#artificialintelligenceAug-7-2020, 11:56:46 GMT

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure determined from the dataset. This will be very helpful in practice where most of the real-world datasets do not follow mathematical theoretical assumptions. KNN is one of the most simple and traditional non-parametric techniques to classify samples. Given an input vector, KNN calculates the approximate distances between the vectors and then assign the points which are not yet labeled to the class of its K-nearest neighbors. The lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase.

artificial intelligence, machine learning, vector, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Heterogeneous Swarms for Maritime Dynamic Target Search and Tracking

Kwa, Hian Lee, Tokić, Grgur, Bouffanais, Roland, Yue, Dick K. P.

arXiv.org Artificial IntelligenceAug-3-2020

Current strategies employed for maritime target search and tracking are primarily based on the use of agents following a predetermined path to perform a systematic sweep of a search area. Recently, dynamic Particle Swarm Optimization (PSO) algorithms have been used together with swarming multi-robot systems (MRS), giving search and tracking solutions the added properties of robustness, scalability, and flexibility. Swarming MRS also give the end-user the opportunity to incrementally upgrade the robotic system, inevitably leading to the use of heterogeneous swarming MRS. However, such systems have not been well studied and incorporating upgraded agents into a swarm may result in degraded mission performances. In this paper, we propose a PSO-based strategy using a topological k-nearest neighbor graph with tunable exploration and exploitation dynamics with an adaptive repulsion parameter. This strategy is implemented within a simulated swarm of 50 agents with varying proportions of fast agents tracking a target represented by a fictitious binary function. Through these simulations, we are able to demonstrate an increase in the swarm's collective response level and target tracking performance by substituting in a proportion of fast buoys.

evolutionary algorithm, machine learning, swarm, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IEEECONF38699.2020.9389145

2008.00696

Country:

Europe (1.00)
Asia (0.93)
Oceania (0.68)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

Innovative Platform for Designing Hybrid Collaborative & Context-Aware Data Mining Scenarios

Avram, Anca, Matei, Oliviu, Pintea, Camelia, Anton, Carmen

arXiv.org Artificial IntelligenceJul-27-2020

The process of knowledge discovery involves nowadays a major number of techniques. Context-Aware Data Mining (CADM) and Collaborative Data Mining (CDM) are some of the recent ones. the current research proposes a new hybrid and efficient tool to design prediction models called Scenarios Platform-Collaborative & Context-Aware Data Mining (SP-CCADM). Both CADM and CDM approaches are included in the new platform in a flexible manner; SP-CCADM allows the setting and testing of multiple configurable scenarios related to data mining at once. The introduced platform was successfully tested and validated on real life scenarios, providing better results than each standalone technique-CADM and CDM. Nevertheless, SP-CCADM was validated with various machine learning algorithms-k-Nearest Neighbour (k-NN), Deep Learning (DL), Gradient Boosted Trees (GBT) and Decision Trees (DT). SP-CCADM makes a step forward when confronting complex data, properly approaching data contexts and collaboration between data. Numerical experiments and statistics illustrate in detail the potential of the proposed platform.

artificial intelligence, scenario, survey article, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/math8050684

2007.13705

Country:

Europe > Portugal (0.46)
Europe > Germany (0.28)
Oceania > Australia (0.28)
(2 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback