Appendix: Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang