VIPER: Visual Perception and Explainable Reasoning for Sequential Decision-Making