Robust Learning for Optimal Dynamic Treatment Regimes with Observational Data