Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.
Different from the traditional classification tasks which assume mutual exclusion of labels, hierarchical multi-label classification (HMLC) aims to assign multiple labels to every instance with the labels organized under hierarchical relations. In fact, linguistic ontologies are intrinsic hierarchies. Besides the labels, the conceptual relations between words can also form hierarchical structures. Thus it can be a challenge to learn mappings from the word space to the label space, and vice versa. We propose to model the word and label hierarchies by embedding them jointly in the hyperbolic space. The main reason is that the tree-likeness of the hyperbolic space matches the complexity of symbolic data with hierarchical structures. A new hyperbolic interaction model (HyperIM) is designed to learn the label-aware document representations and make predictions for HMLC. Extensive experiments are conducted on three benchmark datasets. The results have demonstrated that the new model can realistically capture the complex data structures and further improve the performance for HMLC comparing with the state-of-the-art methods. To facilitate future research, our code is publicly available.
Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class centroids whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding would not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100 and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.
Neural entity typing models typically represent entity types as vectors in a high-dimensional space, but such spaces are not well-suited to modeling these types' complex interdependencies. We study the ability of box embeddings, which represent entity types as d-dimensional hyperrectangles, to represent hierarchies of fine-grained entity type labels even when these relationships are not defined explicitly in the ontology. Our model represents both types and entity mentions as boxes. Each mention and its context are fed into a BERT-based model to embed that mention in our box space; essentially, this model leverages typological clues present in the surface text to hypothesize a type representation for the mention. Soft box containment can then be used to derive probabilities, both the posterior probability of a mention exhibiting a given type and the conditional probability relations between types themselves. We compare our approach with a strong vector-based typing model, and observe state-of-the-art performance on several entity typing benchmarks. In addition to competitive typing performance, our box-based model shows better performance in prediction consistency (predicting a supertype and a subtype together) and confidence (i.e., calibration), implying that the box-based model captures the latent type hierarchies better than the vector-based model does.
Zero-shot learning transfers knowledge from seen classes to novel unseen classes to reduce human labor of labelling data for building new classifiers. Much effort on zero-shot learning however has focused on the standard multi-class setting, the more challenging multi-label zero-shot problem has received limited attention. In this paper we propose a transfer-aware embedding projection approach to tackle multi-label zero-shot learning. The approach projects the label embedding vectors into a low-dimensional space to induce better inter-label relationships and explicitly facilitate information transfer from seen labels to unseen labels, while simultaneously learning a max-margin multi-label classifier with the projected label embeddings. Auxiliary information can be conveniently incorporated to guide the label embedding projection to further improve label relation structures for zero-shot knowledge transfer. We conduct experiments for zero-shot multi-label image classification. The results demonstrate the efficacy of the proposed approach.