class relationship
- North America > United States > Ohio (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- North America > United States > Ohio (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities
In this work, we study the problem of transductive pairwise classification from pairwise similarities~\footnote{The pairwise similarities are usually derived from some side information instead of the underlying class labels.}. The goal of transductive pairwise classification from pairwise similarities is to infer the pairwise class relationships, to which we refer as pairwise labels, between all examples given a subset of class relationships for a small set of examples, to which we refer as labeled examples. We propose a very simple yet effective algorithm that consists of two simple steps: the first step is to complete the sub-matrix corresponding to the labeled examples and the second step is to reconstruct the label matrix from the completed sub-matrix and the provided similarity matrix. Our analysis exhibits that under several mild preconditions we can recover the label matrix with a small error, if the top eigen-space that corresponds to the largest eigenvalues of the similarity matrix covers well the column space of label matrix and is subject to a low coherence, and the number of observed pairwise labels is sufficiently enough. We demonstrate the effectiveness of the proposed algorithm by several experiments.
Fine-Tuning is Fine, if Calibrated
Mai, Zheda, Chowdhury, Arpita, Zhang, Ping, Tu, Cheng-Hao, Chen, Hong-You, Pahuja, Vardaan, Berger-Wolf, Tanya, Gao, Song, Stewart, Charles, Su, Yu, Chao, Wei-Lun
Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?" To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis.
- North America > United States > Ohio (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Understanding the Role of Invariance in Transfer Learning
Speicher, Till, Nanda, Vedant, Gummadi, Krishna P.
Transfer learning is a powerful technique for knowledge-sharing between different tasks. Recent work has found that the representations of models with certain invariances, such as to adversarial input perturbations, achieve higher performance on downstream tasks. These findings suggest that invariance may be an important property in the context of transfer learning. However, the relationship of invariance with transfer performance is not fully understood yet and a number of questions remain. For instance, how important is invariance compared to other factors of the pretraining task? How transferable is learned invariance? In this work, we systematically investigate the importance of representational invariance for transfer learning, as well as how it interacts with other parameters during pretraining. To do so, we introduce a family of synthetic datasets that allow us to precisely control factors of variation both in training and test data. Using these datasets, we a) show that for learning representations with high transfer performance, invariance to the right transformations is as, or often more, important than most other factors such as the number of training samples, the model architecture and the identity of the pretraining classes, b) show conditions under which invariance can harm the ability to transfer representations and c) explore how transferable invariance is between tasks.
Fine-grained Classes and How to Find Them
Grcić, Matej, Gadetsky, Artyom, Brbić, Maria
In many practical applications, coarse-grained labels are readily available compared to fine-grained labels that reflect subtle differences between classes. However, existing methods cannot leverage coarse labels to infer fine-grained labels in an unsupervised manner. To bridge this gap, we propose FALCON, a method that discovers fine-grained classes from coarsely labeled data without any supervision at the fine-grained level. FALCON simultaneously infers unknown fine-grained classes and underlying relationships between coarse and fine-grained classes. Moreover, FALCON is a modular method that can effectively learn from multiple datasets labeled with different strategies. We evaluate FALCON on eight image classification tasks and a single-cell classification task. FALCON outperforms baselines by a large margin, achieving 22% improvement over the best baseline on the tieredImageNet dataset with over 600 fine-grained classes.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Transportation > Ground > Road (0.46)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Extracting Certainty from Uncertainty: Transductive Pairwise Classification from Pairwise Similarities
In this work, we study the problem of transductive pairwise classification from pairwise similarities \footnote{The pairwise similarities are usually derived from some side information instead of the underlying class labels.}. The goal of transductive pairwise classification from pairwise similarities is to infer the pairwise class relationships, to which we refer as pairwise labels, between all examples given a subset of class relationships for a small set of examples, to which we refer as labeled examples. We propose a very simple yet effective algorithm that consists of two simple steps: the first step is to complete the sub-matrix corresponding to the labeled examples and the second step is to reconstruct the label matrix from the completed sub-matrix and the provided similarity matrix. Our analysis exhibits that under several mild preconditions we can recover the label matrix with a small error, if the top eigen-space that corresponds to the largest eigenvalues of the similarity matrix covers well the column space of label matrix and is subject to a low coherence, and the number of observed pairwise labels is sufficiently enough. We demonstrate the effectiveness of the proposed algorithm by several experiments.
Understanding and Improving Knowledge Distillation
Tang, Jiaxi, Shivanna, Rakesh, Zhao, Zhe, Lin, Dong, Singh, Anima, Chi, Ed H., Jain, Sagar
Knowledge distillation is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a higher capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrificing too much on model quality. Despite the large success of knowledge distillation, better understanding of how it benefits student model's training dynamics remains under-explored. In this paper, we dissect the effects of knowledge distillation into three main factors: (1) benefits inherited from label smoothing, (2) example re-weighting based on teacher's confidence on ground-truth, and (3) prior knowledge of optimal output (logit) layer geometry. Using extensive systematic analyses and empirical studies on synthetic and real-world datasets, we confirm that the aforementioned three factors play a major role in knowledge distillation. Furthermore, based on our findings, we propose a simple, yet effective technique to improve knowledge distillation empirically.
- North America > United States (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Adversarial Domain Adaptation Being Aware of Class Relationships
Wang, Zeya, Jing, Baoyu, Ni, Yang, Dong, Nanqing, Xie, Pengtao, Xing, Eric P.
Adversarial training is a useful approach to promote the learning of transferable representations across the source and target domains, which has been widely applied for domain adaptation (DA) tasks based on deep neural networks. Until very recently, existing adversarial domain adaptation (ADA) methods ignore the useful information from the label space, which is an important factor accountable for the complicated data distributions associated with different semantic classes. Especially, the inter-class semantic relationships have been rarely considered and discussed in the current work of transfer learning. In this paper, we propose a novel relationship-aware adversarial domain adaptation (RADA) algorithm, which first utilizes a single multi-class domain discriminator to enforce the learning of inter-class dependency structure during domain-adversarial training and then aligns this structure with the inter-class dependencies that are characterized from training the label predictor on the source domain. Specifically, we impose a regularization term to penalize the structure discrepancy between the inter-class dependencies respectively estimated from domain discriminator and label predictor. Through this alignment, our proposed method makes the ADA aware of class relationships. Empirical studies show that the incorporation of class relationships significantly improves the performance on benchmark datasets.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas (0.04)
Implementing Evidential Reasoning in Expert Systems
However, the theory has not been implemented for reasoning in expert systems due to.its difficulty dealing with uncertain rules. More recently, several extenstions to the theory has been proposed to overcome this difficulty [Yen, 1986a] [Liu, 1986]. Based on Yen's extended DS theory, we have implemented a prototype expert system, named GERTIS (General Evidential Reasoning Tool for Intelligent Systems), that diagnoses rheumatoid arthritis. We chose unspecified polyarthritis as the area of our medical consultation system because the diagnoses form a disease hierarchy, which fits Dempster-Shafer based reasoning best. GERTIS uses the knowledge base of OADIAG-2, a medical expert system developed by Peter Adlassnig [Adlassnig, 1985a,b]. Through the use of OADIAG-2's knowledge base, relevant evidence and rules have been already identified for the area of arthritis. In order to suit the needs of our model, however, the rules of OADIAG-2 were modified and reorganized.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (3 more...)