SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph