We introduce kNN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a k-nearest neighbors (kNN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong WIKITEXT-103 LM, with neighbors drawn from the original training set, our kNN-LM achieves a new state-of-the-art perplexity of 15.79 – a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge.
Intuitively, locally finiteness means that the refinement operator is computable, completeness means we can generate, by refinement of a, any element of G related to a given element g 1 by the order relation, and properness means that a refinement operator does not generate elements which are equivalent to the element being refined. When a refinement operator is locally finite, complete and proper, we say that it is ideal. Notice that all the subsumption relations presented above satisfy the reflexive 2 and transitive 3 properties. Therefore, the pair (G,), where G is the set of all DLGs given a set of labels L, and is any of the subsumption relations defined above is a quasi-ordered set. Thus, this opens the door to defining refinement operators for DLGs. Intuitively, a downward refinement operator for DLGs will generate refinements of a given DLG by either adding vertices, edges, or by making some of the labels more specific, thus making the graph more specific. In the following subsections, we will introduce a collection of refinement operators for connected DLGs, and discuss their theoretical properties. A summary of these operators is shown in Table 1, where we show that under the object-identity constraint, all the refinement operators presented in this document are ideal. If we do not impose object-identity, then the operators are locally complete and complete, but not proper.
Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.
Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.
KNN has the reputation to be the word simplest but efficient supervised learning algorithm used for either classification or regression. KNN prediction efficiency highly depends on the size of its training data but when this training data grows KNN suffers from slowness in making decisions since it needs to search nearest neighbors within the entire dataset at each decision making. This paper proposes a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given observation. The proposed approach consists of dividing the data space into subcells of a virtual grid built on top of data space. The mapping between the data points and subcells is performed using hashing. When it comes to select the nearest neighbors of a given observation, we firstly identify the cell the observation belongs by using hashing, and then we look for nearest neighbors from that central cell and cells around it layer by layer. From our experiment performance analysis on publicly available datasets, our algorithm outperforms the original KNN in time efficiency with a prediction quality as good as that of KNN it also offers competitive performance with solutions like KDtree
Label noise is increasingly prevalent in datasets acquired from noisy channels. Existing approaches that detect and remove label noise generally rely on some form of supervision, which is not scalable and error-prone. In this paper, we propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF). We construct a dependence model to estimate the posterior probability of an instance being incorrectly labeled given the dataset, and rank instances based on their estimated probabilities. Our method 1) Does not require supervision from ground-truth labels, or priors on label or noise distribution. 2) It is interpretable by design, enabling transparency in label noise removal. 3) It is agnostic to classifier architecture/optimization framework and content modality. These advantages enable wide applicability in real noise settings, unlike prior works constrained by one or more conditions. NoiseRank improves state-of-the-art classification on Food101-N (~20% noise), and is effective on high noise Clothing-1M (~40% noise).
A crime is a punishable offence that is harmful for an individual and his society. It is obvious to comprehend the patterns of criminal activity to prevent them. Research can help society to prevent and solve crime activates. Study shows that only 10 percent offenders commits 50 percent of the total offences. The enforcement team can respond faster if they have early information and pre-knowledge about crime activities of the different points of a city. In this paper, supervised learning technique is used to predict crimes with better accuracy. The proposed system predicts crimes by analyzing data-set that contains records of previously committed crimes and their patterns. The system stands on two main algorithms - i) decision tree, and ii) k-nearest neighbor. Random Forest algorithm and Adaboost are used to increase the accuracy of the prediction. Finally, oversampling is used for better accuracy. The proposed system is feed with a criminal-activity data set of twelve years of San Francisco city.
Corporations today face increasing demands for the timely and effective delivery of customer service. This creates the need for a robust and accurate automated solution to what is formally known as the ticket routing problem. This task is to match each unresolved service incident, or "ticket", to the right group of service experts. Existing studies divide the task into two independent subproblems - initial group assignment and inter-group transfer. However, our study addresses both subproblems jointly using an end-to-end modeling approach. We first performed a preliminary analysis of half a million archived tickets to uncover relevant features. Then, we devised the UFTR, a Unified Framework for Ticket Routing using four types of features (derived from tickets, groups, and their interactions). In our experiments, we implemented two ranking models with the UFTR. Our models outperform baselines on three routing metrics. Furthermore, a post-hoc analysis reveals that this superior performance can largely be attributed to the features that capture the associations between ticket assignment and group assignment. In short, our results demonstrate that the UFTR is a superior solution to the ticket routing problem because it takes into account previously unexploited interrelationships between the group assignment and group transfer problems.
The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.
We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. Applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectivness of our classification criteria. Papers published at the Neural Information Processing Systems Conference.