DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement