Deep neural networks trained using a softmax layer at the top and the cross-entropy loss are ubiquitous tools for image classification. Yet, this does not naturally enforce intra-class similarity nor inter-class margin of the learned deep representations. To simultaneously achieve these two goals, different solutions have been proposed in the literature, such as the pairwise or triplet losses. However, such solutions carry the extra task of selecting pairs or triplets, and the extra computational burden of computing and learning for many combinations of them. In this paper, we propose a plug-and-play loss term for deep networks that explicitly reduces intra-class variance and enforces inter-class margin simultaneously, in a simple and elegant geometric manner. For each class, the deep features are collapsed into a learned linear subspace, or union of them, and inter-class subspaces are pushed to be as orthogonal as possible. Our proposed Orthogonal Low-rank Embedding (OL\'E) does not require carefully crafting pairs or triplets of samples for training, and works standalone as a classification loss, being the first reported deep metric learning framework of its kind. Because of the improved margin between features of different classes, the resulting deep networks generalize better, are more discriminative, and more robust. We demonstrate improved classification performance in general object recognition, plugging the proposed loss term into existing off-the-shelf architectures. In particular, we show the advantage of the proposed loss in the small data/model scenario, and we significantly advance the state-of-the-art on the Stanford STL-10 benchmark.
This paper presents a neural network-based end-to-end clustering framework. We design a novel strategy to utilize the contrastive criteria for pushing data-forming clusters directly from raw data, in addition to learning a feature embedding suitable for such clustering. The network is trained with weak labels, specifically partial pairwise relationships between data instances. The cluster assignments and their probabilities are then obtained at the output layer by feed-forwarding the data. The framework has the interesting characteristic that no cluster centers need to be explicitly specified, thus the resulting cluster distribution is purely data-driven and no distance metrics need to be predefined. The experiments show that the proposed approach beats the conventional two-stage method (feature embedding with k-means) by a significant margin. It also compares favorably to the performance of the standard cross entropy loss for classification. Robustness analysis also shows that the method is largely insensitive to the number of clusters. Specifically, we show that the number of dominant clusters is close to the true number of clusters even when a large k is used for clustering.
Despite the breakthroughs achieved by deep learning models in conventional supervised learning scenarios, their dependence on sufficient labeled training data in each class prevents effective applications of these deep models in situations where labeled training instances for a subset of novel classes are very sparse -- in the extreme case only one instance is available for each class. To tackle this natural and important challenge, one-shot learning, which aims to exploit a set of well labeled base classes to build classifiers for the new target classes that have only one observed instance per class, has recently received increasing attention from the research community. In this paper we propose a novel end-to-end deep triplet ranking network to perform one-shot learning. The proposed approach learns class universal image embeddings on the well labeled base classes under a triplet ranking loss, such that the instances from new classes can be categorized based on their similarity with the one-shot instances in the learned embedding space. Moreover, our approach can naturally incorporate the available one-shot instances from the new classes into the embedding learning process to improve the triplet ranking model. We conduct experiments on two popular datasets for one-shot learning. The results show the proposed approach achieves better performance than the state-of-the- art comparison methods.
The margin-based softmax loss functions greatly enhance intra-class compactness and perform well on the tasks of face recognition and object classification. Outperformance, however, depends on the careful hyperparameter selection. Moreover, the hard angle restriction also increases the risk of overfitting. In this paper, angular loss suggested by maximizing the angular gradient to promote intra-class compactness avoids overfitting. Besides, our method has only one adjustable constant for intra-class compactness control. We define three metrics to measure inter-class separability and intra-class compactness. In experiments, we test our method, as well as other methods, on many well-known datasets. Experimental results reveal that our method has the superiority of accuracy improvement, discriminative information, and time-consumption.
At present, object recognition studies are mostly conducted in a closed lab setting with classes in test phase typically in training phase. However, real-world problem is far more challenging because: i) new classes unseen in the training phase can appear when predicting; ii) discriminative features need to evolve when new classes emerge in real time; and iii) instances in new classes may not follow the "independent and identically distributed" (iid) assumption. Most existing work only aims to detect the unknown classes and is incapable of continuing to learn newer classes. Although a few methods consider both detecting and including new classes, all are based on the predefined handcrafted features that cannot evolve and are out-of-date for characterizing emerging classes. Thus, to address the above challenges, we propose a novel generic end-to-end framework consisting of a dynamic cascade of classifiers that incrementally learn their dynamic and inherent features. The proposed method injects dynamic elements into the system by detecting instances from unknown classes, while at the same time incrementally updating the model to include the new classes. The resulting cascade tree grows by adding a new leaf node classifier once a new class is detected, and the discriminative features are updated via an end-to-end learning strategy. Experiments on two real-world datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.