Appendix: LanguageModelswithImageDescriptors areStrongFew-ShotVideo-LanguageLearners