theoretical comparison
Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning. Our theoretical findings well agree with the experimental results on artificial and benchmark data even when the experimental setup does not match the theoretical assumptions exactly.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Reviews: Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
The basic problem studied in the paper concerns learning from data which is only partially labelled, but nonetheless doing better than with fully labelled data. In the particular scenario where one has a pool of unlabelled data in lieu of one of the classes, the paper seeks to quantify the impact this has on the estimation error of the learned classifier. Determining the degradation or lack thereof when learning from unlabelled data is interesting, and thus the paper seems well motivated. The machinery used to illustrate its messages are fairly standard -- the key quantities, namely the estimation errors for each scenario, are derived from a simple Rademacher analysis -- however, the final results appear novel, with implications worked through in various scenarios. The papers hinges on the simple facts that (a) different risks (for PN/PU/NU learning) may be seen as employing different weightings on individual risks for the positive and negative class, and (b) these ratings are reflected in appropriate terms for the estimation error.
Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
Zhang, Qi, Du, Tianqi, Huang, Haotian, Wang, Yifei, Wang, Yisen
In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL. Through establishing theoretical frameworks, we elucidate the strengths and limitations of autoregressive and masked SSL within the primary evaluation tasks of classification and content generation. Our findings demonstrate that in classification tasks, the flexibility of targeted tokens in masked SSL fosters more inter-sample connections compared to the fixed position of target tokens in autoregressive SSL, which yields superior clustering performance. In content generation tasks, the misalignment between the flexible lengths of test samples and the fixed length of unmasked texts in masked SSL (vs. flexible lengths of conditional texts in autoregressive SSL) hinders its generation performance. To leverage each other's strengths and mitigate weaknesses, we propose diversity-enhanced autoregressive and variable-length masked objectives, which substantially improve the classification performance of autoregressive SSL and the generation performance of masked SSL. Code is available at https://github.com/PKU-ML/LookAheadLookAround.
- North America > United States (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > China > Beijing > Beijing (0.04)
Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning Tomoya Sakai
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning. Our theoretical findings well agree with the experimental results on artificial and benchmark data even when the experimental setup does not match the theoretical assumptions exactly.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Multiclass Learning Approaches: A Theoretical Comparison with Implications
We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over \reals d . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions.
Multiclass Learning Approaches: A Theoretical Comparison with Implications
Daniely, Amit, Sabato, Sivan, Shwartz, Shai S.
We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension $d$, and in particular from the class of halfspaces over $\reals d$. We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions.
Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning
Niu, Gang, Plessis, Marthinus Christoffel du, Sakai, Tomoya, Ma, Yao, Sugiyama, Masashi
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning.
A Theoretical Comparison of the Bounds of MM, NBS, and GBFHS
Alcazar, Vidal (Riken AIP) | Barley, Mike (University of Auckland) | Riddle, Pat (University of Auckland)
Recent work in bidirectional front-to-end heuristic search has led to the development of novel algorithms that have advanced the state of the art after many years without major developments. In particular, three different algorithms with very different behavior have been proposed: MM, NBS and GBFHS. In this paper we perform a theoretical comparison of these algorithms, defining lower and upper bounds for each of them and analyzing why a given algorithm displays beneficial characteristics that the others lack. With this information, we propose a simple and intuitive near-optimal algorithm to be used as a baseline for comparison in bidirectional front-to-end heuristic search.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.06)