Xiao, Yanshan (University of Technology, Sydney) | Liu, Bo (South China University of Technology) | Yin, Jie (CSIRO ICT Centre) | Cao, Longbing (University of Technology, Sydney) | Zhang, Chengqi (University of Technology, Sydney) | Hao, Zhifeng (Guangdong University of Technology)
Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively.
Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.
In the aviation safety research domain, cause identification refers to the task of identifying the possible causes responsible for the incident describedin an aviation safety incident report. This task presents a number of challenges, including the scarcity of labeled data and the difficulties in finding the relevant portions of the text. We investigate the use of annotator rationales to overcome these challenges, proposing several new ways of utilizing rationales and showing that through judicious use of the rationales, it is possible to achieve significant improvement over a unigram SVM baseline.
When labeling objects via Internet-based outsourcing systems, the labelers may have bias, because they lack expertise, dedication and personal preference. These reasons cause Imbalanced Multiple Noisy Labeling. To deal with the imbalance labeling issue, we propose an agnostic algorithm PLAT (Positive LAbel frequency Threshold) which does not need any information about quality of labelers and underlying class distribution. Simulations on eight real-world datasets with different underlying class distributions demonstrate that PLAT not only effectively deals with the imbalanced multiple noisy labeling problem that off-the-shelf agnostic methods cannot cope with, but also performs nearly the same as majority voting under the circumstances that labelers have no bias.
Multi-label learning deals with data associated with multiple labels simultaneously. Previous work on multi-label learning assumes that for each instance, the “full” label set associated with each training instance is given by users. In many applications, however, to get the full label set for each instance is difficult and only a “partial” set of labels is available. In such cases, the appearance of a label means that the instance is associated with this label, while the absence of a label does not imply that this label is not proper for the instance. We call this kind of problem “weak label” problem. In this paper, we propose the WELL (WEak Label Learning) method to solve the weak label problem. We consider that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples. The objective is formulated as a convex optimization problem which can be solved efficiently. Moreover, we exploit the correlation between labels by assuming that there is a group of low-rank base similarities, and the appropriate similarities between instances for different labels can be derived from these base similarities. Experiments validate the performance of WELL.