Artemis: Towards Referential Understanding in Complex Videos
–Neural Information Processing Systems
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short
Neural Information Processing Systems
Nov-20-2025, 04:21:35 GMT
- Country:
- Asia > China (0.04)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning (0.93)
- Vision (1.00)
- Information Technology > Artificial Intelligence