AITopics | perception test challenge 2023

Collaborating Authors

perception test challenge 2023

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA

Zhang, Hailiang, Chao, Dian, Guan, Zhihao, Yang, Yang

arXiv.org Artificial IntelligenceJul-1-2024

In this paper, we introduce a grounded video question-answering solution. Our research reveals that the fixed official baseline method for video question answering involves two main steps: visual grounding and object tracking. However, a significant challenge emerges during the initial step, where selected frames may lack clearly identifiable target objects. Furthermore, single images cannot address questions like "Track the container from which the person pours the first time." To tackle this issue, we propose an alternative two-stage approach:(1) First, we leverage the VALOR model to answer questions based on video information.(2) concatenate the answered questions with their respective answers. Finally, we employ TubeDETR to generate bounding boxes for the targets.

perception test challenge 2023, video, yang yang, (12 more...)

arXiv.org Artificial Intelligence

2407.01907

Country: Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

Huang, Yurui, Yang, Yang, Chen, Shou, Wu, Xiangyu, Chen, Qingguo, Lu, Jianfeng

arXiv.org Artificial IntelligenceJul-1-2024

In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information to help the model better localize the start and end of sounds. The fused features are trained in a multi-scale Transformer for training. In the final test dataset, we achieved a mean average precision (mAP) of 0.33, obtaining the second-best performance in this track.

audio feature, computer vision, localization, (13 more...)

arXiv.org Artificial Intelligence

2407.02318

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
(9 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

Pan, Hongpeng, Yang, Yang, Fu, Zhongtian, Zhang, Yuxuan, Du, Shian, Xu, Yi, Ji, Xiangyang

arXiv.org Artificial IntelligenceMar-26-2024

This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.

prediction, trajectory prediction, video, (12 more...)

arXiv.org Artificial Intelligence

2403.17994

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback