Sequential Classification with Empirically Observed Statistics

Haghifam, Mahdi, Tan, Vincent Y. F., Khisti, Ashish

arXiv.org Machine Learning 

F. Tan, and Ashish Khisti Abstract Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of empirically sampled sequences are available to a decision maker. The decision maker is tasked to classify a test sequence which is known to be generated according to either one of the distributions. In particular, for the binary case, the decision maker wishes to perform the classification task with minimum number of the test samples, so, at each step, she declares that either hypothesis 1 is true, hypothesis 2 is true, or she requests for an additional test sample. We propose a classifier and analyze the type-I and type-II error probabilities. We demonstrate the significant advantage of our sequential scheme compared to an existing non-sequential classifier proposed by Gutman. Finally, we extend our setup and results to the multi-class classification scenario and again demonstrate that the variable-length nature of the problem affords significant advantages as one can achieve the same set of exponents as Gutman's fixed-length setting but without having the rejection option. Index T erms Sequential classification, Empirically sampled sequences, Error exponents, V ariable-length I. I NTRODUCTION Quick and accurate classification is crucial in many real-life applications. For instance, to diagnose haematologic diseases based on blood test results, a physician wishes to detect the pattern, deviations, and relations in the blood samples of a patient as quickly as possible to make treatment plans. Similar challenges can be found in a broad range of applications such as genomics analysis, finance, and abnormal detection where there is an inherent tradeoff between speed and accuracy. In many real-world applications, classical hypothesis testing is infeasible due to the fact that the probability distributions of the sources are unknown. In practice, one often encounters classification problems in which one has access to training samples and is required to classify a set of test samples according to which distribution this set is generated from. To incorporate the real-life requirement of classifying the test samples as quickly as possible, one can consider the sequential statistical classification setup. This setup addresses the problem of classifying test samples given training samples with the additional requirement that the decision maker is required to make his/her decision based on as few tests samples as possible; it is however, known that all the test samples originate from the same distribution. The problem of classification using empirically observed statistics has been studied in many prior works.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found