On pattern classification with weighted dimensions

Mollah, Ayatullah Faruk

arXiv.org Artificial Intelligence 

Studies on various facets of pattern classification is often imperative while working with multi - dimensional samples pertaining to diverse application scenarios. In this notion, w eighted dimension - based distance measure has been one of the vital considerat ions in pattern analysis as it reflects the degree of similarity between samples . Though it is often presumed to be settled with the pervasive use of Euclidean distance, plethora of issues often surface. In this paper, we present (a) a detail analysis on t he impact of distance measure norms and weights of dimensions along with visualization, (b) a novel weighting scheme for each dimension, (c) incorporation of this dimensional weighting schema in to a KNN classifier, and (d) pattern classification on a varie ty of synthetic as well as realistic datasets with the developed model . It has perform ed well across diverse experiments in comparison to the traditional KNN under the same experimental setups. Specifically, for gene expression datasets, it yields signific ant and consistent gain in classification accuracy (around 10%) in all cross - validation experiments with different values of k. As such datasets contain limited number of samples of high dimensions, meaningful selection of nearest neighbours is desirable, and this requirement is reasonably met by regulat ing the shape and size of the region enclos ing the k number of reference samples with the developed weighting schema and appropriate norm . I t, therefore, stands as an important generalization of K NN classifier powered by weighted Minkowski distance with the present weighting schema .