A Video-grounded Dialogue Dataset and Metric for Event-driven Activities