Brief Review -- LiT: Zero-Shot Transfer with Locked-image text Tuning

Mar-1-2023, 12:40:15 GMT–#artificialintelligence

The proposed model significantly outperforms the previous state-of-the-art methods at ImageNet zero-shot classification. There are 8.3% and 8.1% improvement over CLIP and ALIGN, respectively. With a pre-trained image model, the proposed setup converges significantly faster than the standard from-scratch setups reported in the literature. LiT provides a way to reuse the already pre-trained models in the literature. It is evident that locking the image tower almost always works best and using a pre-trained image tower significantly helps across the board, whereas using a pre-trained text tower only marginally improves performance, and locking the text tower does not work well.

brief review, locked-image text tuning, zero-shot transfer, (3 more...)

#artificialintelligence

Mar-1-2023, 12:40:15 GMT

News Web Page

Add feedback

Genre:
- Research Report (0.47)
- Overview (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)