Appendix of Prophet Attention

Oct-2-2025, 04:30:34 GMT–Neural Information Processing Systems

CIDEr-c40, which is the default ranking score in the leaderboard, and rank the 1st. Compared with image captioning, the target of video captioning is the video clip, i.e., an ordered The dataset contain 10,000 video clips, and each video is paired with 20 annotated sentences. We use the official splits to report our results. CIDEr, which is built upon on n-gram matching, is used in our tests for performance evaluation. All re-implementations and our experiments were ran on V100 GPUs.

caption, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Oct-2-2025, 04:30:34 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.30)

Genre:
- Research Report > New Finding (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.30)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
Then, for imagecaptioning task, common practice [1,13,15,28,30]further adopts CIDEr-based trainingobjectiveusingreinforcementtraining[24]toimprovetheperformanceofimagecaptioning

Similar Docs Excel Report more

Title	Similarity	Source
None found