Goto

Collaborating Authors

 Ishida, Takashi


Learning with Complementary Labels Revisited: A Consistent Approach via Negative-Unlabeled Learning

arXiv.org Artificial Intelligence

Deep learning and its applications have achieved great success in recent years. However, to achieve good performance, large amounts of training data with accurate labels are required, which may not be satisfied in some real-world scenarios. Due to the effectiveness in reducing the cost and effort of labeling while maintaining comparable performance, various weakly supervised learning problems have been investigated in recent years, including semi-supervised learning [Berthelot et al., 2019], noisy-label learning [Patrini et al., 2017], programmatic weak supervision [Zhang et al., 2021a], positive-unlabeled learning [Bekker and Davis, 2020], similarity-based classification [Hsu et al., 2019], and partial-label learning [Wang et al., 2022]. Complementary-label learning is another weakly supervised learning problem that has received a lot of attention recently [Ishida et al., 2017]. In complementary-label learning, we are given training data associated with complementary labels that specify the classes to which the examples do not belong. The task is to learn a multi-class classifier that assigns correct labels to ordinary-label testing data. Collecting training data with complementary labels is much easier and cheaper than collecting ordinary-label data. For example, when asking workers on crowdsourcing platforms to annotate training data, we only need to randomly select a candidate label and then ask them whether the example belongs to that class or not.


Flooding Regularization for Stable Training of Generative Adversarial Networks

arXiv.org Artificial Intelligence

Generative Adversarial Networks (GANs) have shown remarkable performance in image generation. However, GAN training suffers from the problem of instability. One of the main approaches to address this problem is to modify the loss function, often using regularization terms in addition to changing the type of adversarial losses. This paper focuses on directly regularizing the adversarial loss function. We propose a method that applies flooding, an overfitting suppression method in supervised learning, to GANs to directly prevent the discriminator's loss from becoming excessively low. Flooding requires tuning the flood level, but when applied to GANs, we propose that the appropriate range of flood level settings is determined by the adversarial loss function, supported by theoretical analysis of GANs using the binary cross entropy loss. We experimentally verify that flooding stabilizes GAN training and can be combined with other stabilization techniques. We also reveal that by restricting the discriminator's loss to be no greater than flood level, the training proceeds stably even when the flood level is somewhat high.


Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

arXiv.org Machine Learning

There is a fundamental limitation in the prediction performance that a machine learning model can achieve due to the inevitable uncertainty of the prediction target. In classification problems, this can be characterized by the Bayes error, which is the best achievable error with any classifier. The Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance and can be used to detect test set overfitting. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show \emph{uncertainty} of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data. In contrast to others, our method is model-free and even instance-free. Moreover, it has no hyperparameters and gives a more accurate estimate of the Bayes error than classifier-based baselines. Experiments using our method suggest that a recently proposed classifier, the Vision Transformer, may have already reached the Bayes error for certain benchmark datasets.


LocalDrop: A Hybrid Regularization for Deep Neural Networks

arXiv.org Artificial Intelligence

Abstract--In neural networks, developing regularization algorithm s to settle overfitting is one of the major study areas. We prop ose a new approach for the regularization of neural networks by th e local Rademacher complexity called LocalDrop. A new regul arization function for both fully-connected networks (FCNs) and conv olutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the lo cal Rademacher complexity by the strict mathematical deduc tion. The analyses of dropout in FCNs and DropBlock in CNNs with kee p rate matrices in different layers are also included in the c omplexity analyses. With the new regularization function, we establi sh a two-stage procedure to obtain the optimal keep rate matr ix and weight matrix to realize the whole training model. Extensive exper iments have been conducted to demonstrate the effectivenes s of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final per formances. Neural networks have lately shown impressive performance i n sophisticated real-world situations, including image cla ssification [1], object recognition [2] and image captioning [3]. Low, m iddle and high level features are integrated into deep neural netw orks, which are usually trained in an end-to-end manner.


Binary Classification from Positive-Confidence Data

Neural Information Processing Systems

Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at "describing" the positive class by clustering-related methods, but one-class classification does not have the ability to tune hyper-parameters and their aim is not on "discriminating" positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.


Binary Classification from Positive-Confidence Data

Neural Information Processing Systems

Can we learn a binary classifier from only positive data, without any negative data or unlabeled data? We show that if one can equip positive data with confidence (positive-confidence), one can successfully learn a binary classifier, which we name positive-confidence (Pconf) classification. Our work is related to one-class classification which is aimed at "describing" the positive class by clustering-related methods, but one-class classification does not have the ability to tune hyper-parameters and their aim is not on "discriminating" positive and negative classes. For the Pconf classification problem, we provide a simple empirical risk minimization framework that is model-independent and optimization-independent. We theoretically establish the consistency and an estimation error bound, and demonstrate the usefulness of the proposed method for training deep neural networks through experiments.


Complementary-Label Learning for Arbitrary Losses and Models

arXiv.org Machine Learning

In contrast to the standard classification paradigm where the true (or possibly noisy) class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label. This only specifies one of the classes that the pattern does not belong to. The seminal paper on complementary-label learning proposed an unbiased estimator of the classification risk that can be computed only from complementarily labeled data. However, it required a restrictive condition on the loss functions, making it impossible to use popular losses such as the softmax cross-entropy loss. Recently, another formulation with the softmax cross-entropy loss was proposed with consistency guarantee. However, this formulation does not explicitly involve a risk estimator. Thus model/hyper-parameter selection is not possible by cross-validation---we may need additional ordinarily labeled data for validation purposes, which is not available in the current setup. In this paper, we give a novel general framework of complementary-label learning, and derive an unbiased risk estimator for arbitrary losses and models. We further improve the risk estimator by non-negative correction and demonstrate its superiority through experiments.


Binary Classification from Positive-Confidence Data

arXiv.org Machine Learning

Reducing labeling costs in supervised learning is a critical issue in many practical machine learning applications. In this paper, we consider positive-confidence (Pconf) classification, the problem of training a binary classifier only from positive data equipped with confidence. Pconf classification can be regarded as a discriminative extension of one-class classification (which is aimed at "describing" the positive class by clustering-related methods), with ability to tune hyper-parameters for "classifying" positive and negative samples. Pconf classification is also related to positive-unlabeled (PU) classification (which uses hard-labeled positive data and unlabeled data), but the difference is that it enables us to avoid estimating the class priors, which is a critical bottleneck in typical PU classification methods. For the Pconf classification problem, we provide a simple empirical risk minimization framework and give a formulation for linear-in-parameter models that can be implemented easily and computationally efficiently. We also theoretically establish the consistency and estimation error bound for Pconf classification, and demonstrate the practical usefulness of the proposed method for deep neural networks through experiments.


Learning from Complementary Labels

Neural Information Processing Systems

Collecting labeled data is costly and thus a critical bottleneck in real-world classification tasks. To mitigate this problem, we propose a novel setting, namely learning from complementary labels for multi-class classification. A complementary label specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than collecting ordinary labels, since users do not have to carefully choose the correct class from a long list of candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from them. In this paper, we show that an unbiased estimator to the classification risk can be obtained only from complementarily labeled data, if a loss function satisfies a particular symmetric condition. We derive estimation error bounds for the proposed method and prove that the optimal parametric convergence rate is achieved. We further show that learning from complementary labels can be easily combined with learning from ordinary labels (i.e., ordinary supervised learning), providing a highly practical implementation of the proposed method. Finally, we experimentally demonstrate the usefulness of the proposed methods.


Learning from Complementary Labels

arXiv.org Machine Learning

Collecting labeled data is costly and thus a critical bottleneck in real-world classification tasks. To mitigate this problem, we propose a novel setting, namely learning from complementary labels for multi-class classification. A complementary label specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than collecting ordinary labels, since users do not have to carefully choose the correct class from a long list of candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from them. In this paper, we show that an unbiased estimator to the classification risk can be obtained only from complementarily labeled data, if a loss function satisfies a particular symmetric condition. We derive estimation error bounds for the proposed method and prove that the optimal parametric convergence rate is achieved. We further show that learning from complementary labels can be easily combined with learning from ordinary labels (i.e., ordinary supervised learning), providing a highly practical implementation of the proposed method. Finally, we experimentally demonstrate the usefulness of the proposed methods.