Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays