Goto

Collaborating Authors

 Nearest Neighbor Methods


k-Nearest Neighbor Optimization via Randomized Hyperstructure Convex Hull

arXiv.org Machine Learning

In the k-nearest neighbor algorithm (k-NN), the determination of classes for test instances is usually performed via a majority vote system, which may ignore the similarities among data. In this research, the researcher proposes an approach to fine-tune the selection of neighbors to be passed to the majority vote system through the construction of a random n-dimensional hyperstructure around the test instance by introducing a new threshold parameter. The accuracy of the proposed k-NN algorithm is 85.71%, while the accuracy of the conventional k-NN algorithm is 80.95% when performed on the Haberman's Cancer Survival dataset, and 94.44% for the proposed k-NN algorithm, compared to the conventional's 88.89% accuracy score on the Seeds dataset. The proposed k-NN algorithm is also on par with the conventional support vector machine algorithm accuracy, even on the Banknote Authentication and Iris datasets, even surpassing the accuracy of support vector machine on the Seeds dataset.


Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing

arXiv.org Machine Learning

Network slicing promises to provision diversified services with distinct requirements in one infrastructure. Deep reinforcement learning (e.g., deep $\mathcal{Q}$-learning, DQL) is assumed to be an appropriate algorithm to solve the demand-aware inter-slice resource management issue in network slicing by regarding the varying demands and the allocated bandwidth as the environment state and the action, respectively. However, allocating bandwidth in a finer resolution usually implies larger action space, and unfortunately DQL fails to quickly converge in this case. In this paper, we introduce discrete normalized advantage functions (DNAF) into DQL, by separating the $\mathcal{Q}$-value function as a state-value function term and an advantage term and exploiting a deterministic policy gradient descent (DPGD) algorithm to avoid the unnecessary calculation of $\mathcal{Q}$-value for every state-action pair. Furthermore, as DPGD only works in continuous action space, we embed a k-nearest neighbor algorithm into DQL to quickly find a valid action in the discrete space nearest to the DPGD output. Finally, we verify the faster convergence of the DNAF-based DQL through extensive simulations.


Evaluating the Robustness of Nearest Neighbor Classifiers: A Primal-Dual Perspective

arXiv.org Machine Learning

We study the problem of computing the minimum adversarial perturbation of the Nearest Neighbor (NN) classifiers. Previous attempts either conduct attacks on continuous approximations of NN models or search for the perturbation by some heuristic methods. In this paper, we propose the first algorithm that is able to compute the minimum adversarial perturbation. The main idea is to formulate the problem as a list of convex quadratic programming (QP) problems that can be efficiently solved by the proposed algorithms for 1-NN models. Furthermore, we show that dual solutions for these QP problems could give us a valid lower bound of the adversarial perturbation that can be used for formal robustness verification, giving us a nice view of attack/verification for NN models. For $K$-NN models with larger $K$, we show that the same formulation can help us efficiently compute the upper and lower bounds of the minimum adversarial perturbation, which can be used for attack and verification.


CCMI : Classifier based Conditional Mutual Information Estimation

arXiv.org Machine Learning

Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.


Machine Learning in R for beginners

#artificialintelligence

You see that the model makes reasonably accurate predictions, with the exception of one wrong classification in row 29, where "Versicolor" was predicted while the test label is "Virginica". This is already some indication of your model's performance, but you might want to go even deeper into your analysis.


Prototype Propagation Networks (PPN) for Weakly-supervised Few-shot Learning on Category Graph

arXiv.org Machine Learning

A variety of machine learning applications expect to achieve rapid learning from a limited number of labeled data. However, the success of most current models is the result of heavy training on big data. Meta-learning addresses this problem by extracting common knowledge across different tasks that can be quickly adapted to new tasks. However, they do not fully explore weakly-supervised information, which is usually free or cheap to collect. In this paper, we show that weakly-labeled data can significantly improve the performance of meta-learning on few-shot classification. We propose prototype propagation network (PPN) trained on few-shot tasks together with data annotated by coarse-label. Given a category graph of the targeted fine-classes and some weakly-labeled coarse-classes, PPN learns an attention mechanism which propagates the prototype of one class to another on the graph, so that the K-nearest neighbor (KNN) classifier defined on the propagated prototypes results in high accuracy across different few-shot tasks. The training tasks are generated by subgraph sampling, and the training objective is obtained by accumulating the level-wise classification loss on the subgraph. The resulting graph of prototypes can be continually re-used and updated for new tasks and classes. We also introduce two practical test/inference settings which differ according to whether the test task can leverage any weakly-supervised information as in training. On two benchmarks, PPN significantly outperforms most recent few-shot learning methods in different settings, even when they are also allowed to train on weakly-labeled data.


An adaptive nearest neighbor rule for classification

arXiv.org Artificial Intelligence

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, $k$-NN with an optimal choice of $k$. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest.


Introduction to Anomaly Detection using Machine Learning with a Case Study

#artificialintelligence

A common need when you are analyzing real-world data-sets is determining which data point stand out as being different to all others data points. Such data points are known as anomalies. This article was originally published on Medium by Davis David. In this article, you will learn a couple of Machine Learning-Based Approaches for Anomaly Detection and then show how to apply one of these approaches to solve a specific use case for anomaly detection (Credit Fraud detection) in part two. A common need when you analyzing real-world data-sets is determining which data point stand out as being different to all others data points.


Machine Learning: Building Recommender Systems

#artificialintelligence

The scikit-learnthe library has functions that enable us to build these pipelines by concatenating various modules together. We just need to specify the modules along with the corresponding parameters. It will then build a pipeline using these modules that processes the data and trains the system. The pipeline can include modules that perform various functions like feature selection, preprocessing, random forests, clustering, and so on. In this section, we will see how to build a pipeline to select the top K features from an input data point and then classify them using an Extremely Random Forest classifier.


Ignorance-Aware Approaches and Algorithms for Prototype Selection in Machine Learning

arXiv.org Machine Learning

Operating with ignorance is an important concern of the Machine Learning research, especially when the objective is to discover knowledge from the imperfect data. Data mining (driven by appropriate knowledge discovery tools) is about processing available (observed, known and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples, which are not yet observed, known or understood. These tools traditionally take samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach and we suggest considering the things the other way around. What if the task would be as follows: how to learn a model based on our ignorance, i.e. by processing the shape of 'voids' within the available data space? Can we improve traditional classification by modeling also the ignorance? In this paper, we provide some algorithms for the discovery and visualizing of the ignorance zones in two-dimensional data spaces and design two ignorance-aware smart prototype selection techniques (incremental and adversarial) to improve the performance of the nearest neighbor classifiers. We present experiments with artificial and real datasets to test the concept of the usefulness of ignorance discovery in machine learning.