LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Wei Wu

Neural Information Processing Systems 

Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models.