Review for NeurIPS paper: COBE: Contextualized Object Embeddings from Narrated Instructional Video