Artemis: Towards Referential Understanding in Complex Videos