The Leaf Classification playground competition ran on Kaggle from August 2016 to February 2017. Over 1,500 Kagglers competed to accurately identify 99 different species of plants based on a dataset of leaf images and pre-extracted features. Because our playground competitions are designed using publicly available datasets, the real winners in this competition were the authors of impressive kernels. Read on or click the links below to jump to a section. Because the Leaf Classification dataset is small, I wanted a script that could run a bunch of different classifiers in a single pass.
The Dogs versus Cats Redux: Kernels Edition playground competition revived one of our favorite "for fun" image classification challenges from 2013, Dogs versus Cats. This time Kaggle brought Kernels, the best way to share and learn from code, to the table while competitors tackled the problem with a refreshed arsenal including TensorFlow and a few years of deep learning advancements. In this winner's interview, Kaggler Bojan Tunguz shares his 4th place approach based on deep convolutional neural networks and model blending. I am a Theoretical Physicist by training, and have worked in Academia for many years. A few years ago I came across some really cool online machine learning courses, and fell in love with that field.
In this tutorial, we will present a few simple yet effective methods that you can use to build a powerful image classifier, using only very few training examples --just a few hundred or thousand pictures from each class you want to be able to recognize. To acquire a few hundreds or thousands of training images belonging to the classes you are interested in, one possibility would be to use the Flickr API to download pictures matching a given tag, under a friendly license. In our examples we will use two sets of pictures, which we got from Kaggle: 1000 cats and 1000 dogs (although the original dataset had 12,500 cats and 12,500 dogs, we just took the first 1000 images for each class). We also use 400 additional samples from each class as validation data, to evaluate our models. That is very few examples to learn from, for a classification problem that is far from simple.
Herein, we present a system for hyperspectral image segmentation that utilizes multiple class--based denoising autoencoders which are efficiently trained. Moreover, we present a novel hyperspectral data augmentation method for labelled HSI data using linear mixtures of pixels from each class, which helps the system with edge pixels which are almost always mixed pixels. Finally, we utilize a deep neural network and morphological hole-filling to provide robust image classification. Results run on the Salinas dataset verify the high performance of the proposed algorithm.
There are a number of popular evaluation metrics for classification other than accuracy such as recall, precision, AUC, F-scores etc. Instead of listing them all here, I think it is best to point you towards some interesting resources that can kick-start your search for answers. Although you might not be using scikit, the metrics remain relevant. It also quite lists differences between binary classification and multi-class classification setting.