COBE: Contextualized Object Embeddings from Narrated Instructional Video Supplementary Materials
–Neural Information Processing Systems
Our supplementary materials consist of: 1. Implementation Details. We train our model for 10 epochs with an initial learning rate of 0.001, a linear warmup of 500 steps and a momentum of 0.9. We use a multi-scale training approach implemented by resizing the shorter side of the frame randomly between 400 and 800 pixels. Our model is trained in a distributed setting using 64 GPUs, each GPU holding a single frame. We initialize our model with a Faster R-CNN pretrained on COCO for object detection.
Neural Information Processing Systems
Jan-27-2025, 13:40:47 GMT
- Country:
- Africa > Ethiopia (0.15)
- North America > Canada (0.15)
- Genre:
- Industry:
- Education > Educational Technology
- Audio & Video (0.41)
- Media (0.41)
- Education > Educational Technology
- Technology: