Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Open in new window