Supplementary Material for Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Oct-8-2025, 23:42:01 GMT–Neural Information Processing Systems

Sec. B, we provide more examples of the similarity distribution with/without the event and visualize To investigate the flexibility of our approach, we combine LSLD with different SOT A methods for the A VVP task. The experiments show that our denoised labels are indeed influential and can be properly employed on different SOT A methods. Effectiveness of modifying class names in prompts. Table 2, we can see that the segment-level visual metric improves by 1.7 points when we add playing As we transform objects like Accordion into human behavior (i.e. Table 2: Study the impact of varying class names to make the prompt more contextual.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Oct-8-2025, 23:42:01 GMT

Conferences PDF

Add feedback

Country:
- Asia > China > Hubei Province > Wuhan (0.05)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.48)
  - Natural Language > Grammars & Parsing (0.42)

Duplicate Docs Excel Report

Title
7fbae0a0885d3d688840bd34e4a8a698-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found