On Evaluating and Comparing Open Domain Dialog Systems