Appendix of GLIPv2: Unifying Localization and Vision-Language Understanding

Open in new window