Nearest Neighbor Methods
Efficient anomaly detection using bipartite k-NN graphs
Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. The K-kNNG detector, while possessing several desirable characteristics, suffers from high computation complexity, and in [4] a simpler heuristic approximation, the leave-one-out kNNG (L1O-kNNG) was proposed. In this paper, we propose a novel bipartite k-nearest neighbor graph (BP-kNNG) anomaly detection scheme for estimating minimum volume sets. Our bipartite estimator retains all the desirable theoretical properties of the K-kNNG, while being computationally simpler than the K-kNNG and the surrogate L1O-kNNG detectors.
Phase transition in the family of p-resistances
We study the family of p-resistances on graphs for p 1. We prove that for any fixed graph, for p 1, the p-resistance coincides with the shortest path distance, for p 2 it coincides with the standard resistance distance, and for p it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p * and p * such that if p p, then the p-resistance depends on meaningful global properties of the graph, whereas if p p, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p * 1 1/(d-1) and p 1 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p * and p * is an artifact of our proofs.
Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. Applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectivness of our classification criteria.
Using LazyPredict for Evaluating ML Algorithms
Evaluating machine learning algorithms is a common task performed by data scientists. While a data scientist needs to know the different types of machine learning algorithms to use for different types of problems, it is nevertheless paramount that he puts the different algorithms to work on his/her specific dataset. Only by doing that would he/she have a better sense of which algorithm to use to train the model and how to perform hyper-parameter tuning after that. However, choosing the right algorithms is a time-consuming and exhausting process. Ideally, there should be an automated process where you just need to supply your data and the ideal machine learning algorithm to use would be chosen for you. The answer to this is LazyPredict.
Mask-Free Video Instance Segmentation
Ke, Lei, Danelljan, Martin, Ding, Henghui, Tai, Yu-Wing, Tang, Chi-Keung, Yu, Fisher
The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS performance, while only using bounding box annotations for the object state. We leverage the rich temporal mask consistency constraints in videos by introducing the Temporal KNN-patch Loss (TK-Loss), providing strong mask supervision without any labels. Our TK-Loss finds one-to-many matches across frames, through an efficient patch-matching step followed by a K-nearest neighbor selection. A consistency loss is then enforced on the found matches. Our mask-free objective is simple to implement, has no trainable parameters, is computationally efficient, yet outperforms baselines employing, e.g., state-of-the-art optical flow to enforce temporal mask consistency. We validate MaskFreeVIS on the YouTube-VIS 2019/2021, OVIS and BDD100K MOTS benchmarks. The results clearly demonstrate the efficacy of our method by drastically narrowing the gap between fully and weakly-supervised VIS performance. Our code and trained models are available at https://github.com/SysCV/MaskFreeVis.
Smart Home Environment Modelled with a Multi-Agent System
Rasras, Mohammad, Marin, Iuliana, Radu, Serban
A smart home can be considered a place of residence that enables the management of appliances and systems to help with day-to-day life by automated technology. In the current paper is described a prototype that simulates a contextaware environment, developed in a designed smart home. The smart home environment has been simulated using three agents and five locations in a house. The context-aware agents behave based on predefined rules designed for daily activities. Our proposal aims to reduce operational cost of running devices. In the future, monitors of health aspects belonging to home residents will sustain their healthy life daily. Keywords: smart home, multi-agent system, K-Nearest Neighbor algorithm, K-Means Clustering algorithm 1. Introduction Smart home, also known as an intelligent house, incorporates special devices that manage house features.
A Random Projection k Nearest Neighbours Ensemble for Classification via Extended Neighbourhood Rule
Ali, Amjad, Hamraz, Muhammad, Khan, Dost Muhammad, Deebani, Wajdan, Khan, Zardad
Ensembles based on k nearest neighbours (kNN) combine a large number of base learners, each constructed on a sample taken from a given training data. Typical kNN based ensembles determine the k closest observations in the training data bounded to a test sample point by a spherical region to predict its class. In this paper, a novel random projection extended neighbourhood rule (RPExNRule) ensemble is proposed where bootstrap samples from the given training data are randomly projected into lower dimensions for additional randomness in the base models and to preserve features information. It uses the extended neighbourhood rule (ExNRule) to fit kNN as base learners on randomly projected bootstrap samples.
Improving Uncertainty Quantification of Deep Classifiers via Neighborhood Conformal Prediction: Novel Algorithm and Theoretical Analysis
Ghosh, Subhankar, Belkhouja, Taha, Yan, Yan, Doppa, Janardhan Rao
Safe deployment of deep neural networks in high-stake real-world applications requires theoretically sound uncertainty quantification. Conformal prediction (CP) is a principled framework for uncertainty quantification of deep models in the form of prediction set for classification tasks with a user-specified coverage (i.e., true class label is contained with high probability). This paper proposes a novel algorithm referred to as Neighborhood Conformal Prediction (NCP) to improve the efficiency of uncertainty quantification from CP for deep classifiers (i.e., reduce prediction set size). The key idea behind NCP is to use the learned representation of the neural network to identify k nearest-neighbors calibration examples for a given testing input and assign them importance weights proportional to their distance to create adaptive prediction sets. We theoretically show that if the learned data representation of the neural network satisfies some mild conditions, NCP will produce smaller prediction sets than traditional CP algorithms. Our comprehensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets using diverse deep neural networks strongly demonstrate that NCP leads to significant reduction in prediction set size over prior CP methods.
Neighborhood Averaging for Improving Outlier Detectors
Yang, Jiawei, Rahardja, Susanto, Franti, Pasi
-- We hypothesize that similar objects should have similar outlier scores. To our knowledge, all existing outlier detectors calculate the outlier score for each object independently regardless of the outlier scores of the other objects. Therefore, they do not guarantee that similar objects have similar outlier scores. To verify our proposed hypothesis, we propose an outlier score post-processing technique for outlier detectors, called neighborhood averaging (NA), which pays attention to objects and their neighbors and guarantees them to have more similar outlier scores than their original scores. Given an object and its outlier score from any outlier detector, NA modifies its outlier score by combining it with its k nearest neighbors' scores. We demonstrate the effectivity of NA by using the well-known k-nearest neighbors (k-NN). Experimental results show that NA improves all 10 tested baseline detectors by 13% (from 0.70 to 0.79 AUC) on average evaluated on nine real-world datasets. Moreover, even outlier detectors that are already based on k-NN are also improved. The experiments also show that in some applications, the choice of detector is no more significant when detectors are jointly used with NA, which may pose a challenge to the generally considered idea that the data model is the most important factor. Outliers are objects that significantly deviate from other objects. Outliers can indicate useful information, which can be applied in applications such as fraud detection [1, 2], abnormal time series [3, 4], and traffic patterns [5, 6]. Outliers can also be harmful because they are generally unwanted, can be considered errors, and may have biased statistical analysis for applications like clustering [7, 8]. Recently, outlier detection has also been applied to manufacturing data [9] and industrial applications [10]. For these reasons, outliers need to be detected. Most outlier detectors calculate the so-called outlier score for every object independently and then calculate the threshold scores that deviate significantly from the others and label them as outliers [11].
Machine learning based biomedical image processing for echocardiographic images
Heena, Ayesha, Biradar, Nagashettappa, Maroof, Najmuddin M., Bhatia, Surbhi, Agarwal, Rashmi, Prasad, Kanta
The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suitable algorithm which is simple, conceptual and computational, which provides very good accuracy in results. KNN algorithm is a unique user-friendly approach with wide range of applications in machine learning algorithms which are majorly used for the various image processing applications including classification, segmentation and regression issues of the image processing. The proposed system uses gray level co-occurrence matrix features. The trained neural network has been tested successfully on a group of echocardiographic images, errors were compared using regression plot. The results of the algorithm are tested using various quantitative as well as qualitative metrics and proven to exhibit better performance in terms of both quantitative and qualitative metrics in terms of current state -of-the-art methods in the related area. To compare the performance of trained neural network the regression analysis performed showed a good correlation.