noisy student
An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Naganuma, Hiroki, Hataya, Ryuichiro, Mitliagkas, Ioannis
In the realm of out-of-distribution (OOD) generalization tasks, fine-tuning pre-trained models has become a prevalent strategy. Different from most prior work that has focused on advancing learning algorithms, we systematically examined how pre-trained model size, pre-training data scale, and training strategies impact downstream generalization and uncertainty calibration. We evaluated 97 models across diverse pre-trained model sizes, five pre-training datasets, and five data augmentations through extensive experiments on four distribution shift datasets totaling over 100,000 GPU hours. Our results demonstrate the significant impact of pre-trained model selection, with optimal choices substantially improving OOD accuracy over algorithm improvement alone. We find larger models and bigger pre-training data improve OOD performance and calibration, in contrast to some prior studies that found modern deep networks to calibrate worse than classical shallow models. Our work underscores the overlooked importance of pre-trained model selection for out-of-distribution generalization and calibration.
Intriguing properties of generative classifiers
Jaini, Priyank, Clark, Kevin, Geirhos, Robert
What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
Utilizing Deep Learning for Detecting Hotel Scenes
You must have heard this line often, but it's not 100% correct. We often judge something by the first thing we see, including choosing a place to stay. Tiket.com is willing to provide the best experience for users to easily spot what the hotel looks like. That means we should put a building or bedroom image as the main image instead of a bathroom image. Given the examples above, we can see that the hotel building is being set as the main image for a particular hotel's detail page. Typically, these images will be placed on the top part of the hotel detail page on tiket.com.
Are You Ready for Vision Transformer (ViT)?
It is applicable not only for creatures but also for technologies. Technologies in data science have been filled with hypes and biased success stories. Having said that, there are technologies that have lead to the growth of data science: Convolutional Neural Network (CNN). Since AlexNet in 2012, different architectures of CNNs have brought a tremendous contribution to real business operations and academic researches. Residual Networks (ResNet) by Microsoft Research in 2015 brought a real breakthrough to build "deep" CNNs; however, an honorable retirement of this technology would be approaching.
Self-training with Noisy Student improves ImageNet classification
Xie, Qizhe, Hovy, Eduard, Luong, Minh-Thang, Le, Quoc V.
We present a simple self-training method that achieves 87.4% top-1 accuracy on ImageNet, which is 1.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 16.6% to 74.2%, reduces ImageNet-C mean corruption error from 45.7 to 31.2, and reduces ImageNet-P mean flip rate from 27.8 to 16.1. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as good as possible. But during the learning of the student, we inject noise such as data augmentation, dropout, stochastic depth to the student so that the noised student is forced to learn harder from the pseudo labels.