Deep neural networks for image classification typically consists of a convolutional feature extractor followed by a fully connected classifier network. The predicted and the ground truth labels are represented as one hot vectors. Such a representation assumes that all classes are equally dissimilar. However, classes have visual similarities and often form a hierarchy. Learning this latent hierarchy explicitly in the architecture could provide invaluable insights. We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes. We show that for some of the best performing architectures on CIFAR and Imagenet datasets, the proposed replacement and training by LH classifier recovers the accuracy, with a fraction of the number of parameters in the classifier part. Compared to the previous work of HDCNN, which also learns a 2 level hierarchy, we are able to learn a hierarchy at an arbitrary number of levels as well as obtain an accuracy improvement on the Imagenet classification task over them. We also verify that many visually similar classes are grouped together, under the learnt hierarchy.
Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down HC with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i:e:, defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art HC approaches.
I work predominantly in NLP for the last three months at work. It's been a long time I work on the image data. Hence, I decided to build a unique image classifier model as part of my personal project and learning. One thing I am really missing in the current pandemic is traveling. These days I used to see a lot of travel vlogs and travel pictures on Instagram, wondering when we will go back to the normal world. This strikes me to create an image classifier model with five classes like Mountain, Beach, Desert, Lake, and Museum.
Considering i have images with localized sections like whale faces (say similar to right whale recognition kaggle dataset) and i want to remove the background water (this is just an example and not representative of the actual dataset) and just extract the whale faces. I would like to create the bounding boxes over the whale faces and crop those. The blog of the runner up of right whale recognition had the coordinates x, y, width and height for the bounding boxes which he used later to train a simple classifier(i'm not totally sure of the procedure) and generate bounding boxes on test images. What if i don't have these coordinates for my dataset. Do i have to make it manually and then create bounding boxes?