Appendix for QVH IGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries