The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024

Han, Yinan, Jiang, Qingyuan, Mei, Hongming, Yang, Yang, Tang, Jinhui

Oct-7-2024–arXiv.org Artificial Intelligence

Each action is represented by start and end timestamps along This report presents our method for Temporal Action with its corresponding class label, as illustrated in Figure1. Localisation (TAL), which focuses on identifying and classifying This task is critical for various applications, including actions within specific time intervals throughout a video surveillance, content analysis, and human-computer video sequence. We employ a data augmentation technique interaction.The dataset provided for this challenge is derived by expanding the training dataset using overlapping labels from the Perception Test, comprising high-resolution from the Something-SomethingV2 dataset, enhancing the videos (up to 35 seconds long, 30fps, and a maximum resolution model's ability to generalize across various action classes. of 1080p). Each video contains multiple action segment For feature extraction, we utilize state-of-the-art models, including annotations. To facilitate experimentation, both video UMT, VideoMAEv2 for video features, and BEATs and audio features are provided, along with detailed annotations and CAV-MAE for audio features. Our approach involves for the training and validation phases.

artificial intelligence, dataset, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Oct-7-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - District of Columbia > Washington (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.05)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
  - Canada
    - Ontario > Toronto (0.14)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)
- Asia
  - Middle East > Israel (0.04)
  - Macao (0.04)
  - China > Jiangsu Province
    - Nanjing (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > Promising Solution (0.36)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)