Assessing AI system performance: thinking beyond models to deployment contexts - Microsoft Research