Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning

Open in new window