Mu, Xin (Nanjing University) | Zhu, Feida (Singapore Management University) | Du, Juan (Singapore Management University) | Lim, Ee-Peng (Singapore Management University) | Zhou, Zhi-Hua (Nanjing University)
Streaming classification with emerging new class is an important problem of great research challenge and practical value. In many real applications, the task often needs to handle large matrices issues such as textual data in the bag-of-words model and large-scale image analysis. However, the methodologies and approaches adopted by the existing solutions, most of which involve massive distance calculation, have so far fallen short of successfully addressing a real-time requested task. In this paper, the proposed method dynamically maintains two low-dimensional matrix sketches to 1) detect emerging new classes; 2) classify known classes; and 3) update the model in the data stream. The update efficiency is superior to the existing methods. The empirical evaluation shows the proposed method not only receives the comparable performance but also strengthens modelling on large-scale data sets.
Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels however is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three main ingredients. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expert-specified information, e.g., by a mid-level layer of semantic attributes.
Machine learning classifiers are often trained to recognize a set of pre-defined classes. However, in many applications, it is often desirable to have the flexibility of learning additional concepts, with limited data and without re-training on the full training set. This paper addresses this problem, incremental few-shot learning, where a regular classification network has already been trained to recognize a set of base classes, and several extra novel classes are being considered, each with only a few labeled examples. After learning the novel classes, the model is then evaluated on the overall classification performance on both base and novel classes. To this end, we propose a meta-learning model, the Attention Attractor Network, which regularizes the learning of novel classes.
As the proliferation of constant data feeds increases from social media, embedded sensors, and other sources, the capability to provide predictive concept labels to these data streams will become ever more important and lucrative. However, the dynamic, non-stationary nature, and effectively infinite length of data streams pose additional challenges for stream data mining algorithms. The sparse quantity of training data also limits the use of algorithms that are heavily dependent on supervised training. To address all these issues, we propose an incremental semi-supervised method that provides general concept class label predictions, but it also tracks concept clusters within the feature space using an innovative new online clustering algorithm. Each concept cluster contains an embedded stream classifier, creating a diverse ensemble for data instance classification within the generative model used for detecting emerging concepts in the stream. Unlike other recent novel class detection methods, our method goes beyond detecting, and continues to differentiate and track the emerging concepts. We show the effectiveness of our method on several synthetic and real world data sets, and we compare the results against other leading baseline methods.
Most approaches to classifying data streams either divide the stream into fixed-size chunks or use gradual forgetting. Due to evolving nature of data streams, finding a proper size or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. These approaches hence suffer from a trade-off between performance and sensitivity. Existing dynamic sliding window based approaches address this problem by tracking changes in classifier error rate, but are supervised in nature. We propose an efficient semi-supervised framework in this paper which uses change detection on classifier confidence to detect concept drifts, and to determine chunk boundaries dynamically. It also addresses concept evolution problem by detecting outliers having strong cohesion among themselves. Experiment results on benchmark and synthetic data sets show effectiveness of the proposed approach.