A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage
–arXiv.org Artificial Intelligence
We present a reliable temporal grounding pipeline for video-to-analytic alignment of basketball broadcast footage. Given a series of frames as input, our method quickly and accurately extracts time-remaining and quarter values from basketball broadcast scenes. Our work intends to expedite the development of large, multi-modal video datasets to train data-hungry video models in the sports action recognition domain. Our method aligns a pre-labeled corpus of play-by-play annotations containing dense event annotations to video frames, enabling quick retrieval of labeled video segments. Unlike previous methods, we forgo the need to localize game clocks by fine-tuning an out-of-the-box object detector to find semantic text regions directly. Our end-to-end approach improves the generality of our work. Additionally, interpolation and parallelization techniques prepare our pipeline for deployment in a large computing cluster. All code is made publicly available.
arXiv.org Artificial Intelligence
Oct-30-2024
- Country:
- North America > United States > North Carolina (0.04)
- Genre:
- Research Report (0.40)
- Industry:
- Leisure & Entertainment > Sports (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Text Processing (0.57)
- Vision (0.93)
- Information Technology > Artificial Intelligence