Goto

Collaborating Authors

 Nearest Neighbor Methods


Categorising Products in an Online Marketplace: An Ensemble Approach

arXiv.org Artificial Intelligence

In recent years, product categorisation has been a common issue for E-commerce companies who have utilised machine learning to categorise their products automatically. In this study, we propose an ensemble approach, using a combination of different models to separately predict each product's category, subcategory, and colour before ultimately combining the resultant predictions for each product. With the aforementioned approach, we show that an average F1-score of 0.82 can be achieved using a combination of XGBoost and k-nearest neighbours to predict said features.


Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

arXiv.org Artificial Intelligence

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training. However, transcriptions are not always available, and how to improve the ST model performance without transcription, i.e., data efficiency, has rarely been studied in the literature. In this paper, we propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency. Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-parametric datastore via k-Nearest-Neighbor (kNN) retrieval, which removes the dependence on transcription and MT model. Then we decouple the classic knowledge distillation loss into target and non-target distillation to enhance the effect of the knowledge among non-target logits, which is the prominent "dark knowledge". Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.


Clustering-based Imputation for Dropout Buyers in Large-scale Online Experimentation

arXiv.org Artificial Intelligence

In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.


Neural Net and Traditional Classifiers

Neural Information Processing Systems

Previous work on nets with continuous-valued inputs led to generative procedures to construct convex decision regions with two-layer perceptrons (one hidden layer) and arbitrary decision regions with three-layer perceptrons (two hidden layers). Here we demonstrate that two-layer perceptron classifiers trained with back propagation can form both convex and disjoint decision regions. Such classifiers are robust, train rapidly, and provide good performance with simple decision regions. When complex decision regions are required, however, convergence time can be excessively long and performance is often no better than that of k-nearest neighbor classifiers. Three neural net classifiers are presented that provide more rapid training under such situations. Two use fixed weights in the first one or two layers and are similar to classifiers that estimate probability density functions using histograms. A third "feature map classifier" uses both unsupervised and supervised training. It provides good performance with little supervised training in situations such as speech recognition where much unlabeled training data is available. The architecture of this classifier can be used to implement a neural net k-nearest neighbor classifier.


Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems

Neural Information Processing Systems

Eight neural net and conventional pattern classifiers (Bayesian(cid:173) unimodal Gaussian, k-nearest neighbor, standard back-propagation, adaptive-stepsize back-propagation, hypersphere, feature-map, learn(cid:173) ing vector quantizer, and binary decision tree) were implemented on a serial computer and compared using two speech recognition and two artificial tasks. Error rates were statistically equivalent on almost all tasks, but classifiers differed by orders of magnitude in memory requirements, training time, classification time, and ease of adaptivity. Nearest-neighbor classifiers trained rapidly but re(cid:173) quired the most memory. Tree classifiers provided rapid classifica(cid:173) tion but were complex to adapt. Back-propagation classifiers typ(cid:173) ically required long training times and had intermediate memory requirements.


Asymptotic slowing down of the nearest-neighbor classifier

Neural Information Processing Systems

Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distri(cid:173) bution free. We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality."


Locally Adaptive Nearest Neighbor Algorithms

Neural Information Processing Systems

Four versions of a k-nearest neighbor algorithm with locally adap(cid:173) tive k are introduced and compared to the basic k-nearest neigh(cid:173) bor algorithm (kNN). Locally adaptive kNN algorithms choose the value of k that should be used to classify a query by consulting the results of cross-validation computations in the local neighborhood of the query. Local kNN methods are shown to perform similar to kNN in experiments with twelve commonly used data sets. Encour(cid:173) aging results in three constructed tasks show that local methods can significantly outperform kNN in specific applications. Local methods can be recommended for on-line learning and for appli(cid:173) cations where different regions of the input space are covered by patterns solving different sub-tasks.


An Incremental Nearest Neighbor Algorithm with Queries

Neural Information Processing Systems

We consider the general problem of learning multi-category classifi(cid:173) cation from labeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this al(cid:173) gorithm is based is general enough to be used in any learning algo(cid:173) rithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model.


K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms

Neural Information Processing Systems

Guided by an initial idea of building a complex (non linear) decision surface with maximal local margin in input space, we give a possible geometrical intuition as to why K-Nearest Neighbor (KNN) algorithms often perform more poorly than SVMs on classification tasks. We then propose modified K-Nearest Neighbor algorithms to overcome the per- ceived problem. The approach is similar in spirit to Tangent Distance, but with invariances inferred from the local neighborhood rather than prior knowledge. Experimental results on real world classification tasks sug- gest that the modified KNN algorithms often give a dramatic improve- ment over standard KNN and perform as well or better than SVMs.


New Algorithms for Efficient High Dimensional Non-parametric Classification

Neural Information Processing Systems

This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational lee- way, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the pre- diction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.