Learning Robust Decision Policies from Observational Data