Robust temporal difference learning for critical domains