Simple coarse graining and sampling strategies for image recognition

Whitelam, Stephen Machine Learning 

A conceptually simple way to recognize images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data covers the required configuration space. Here we show that this coverage can be substantially increased using simple strategies of coarse graining (replacing groups of images by their centroids) and sampling (using distinct sets of centroids in combination). We use the MNIST data set to show that coarse graining can be used to convert a subset of training images into about an order of magnitude fewer image centroids, with no loss of accuracy of classification of test-set images by direct (nearest-neighbor) classification. Distinct batches of centroids can be used in combination as a means of sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. The approach works most naturally with multiple processors in parallel.