Multi-Faceted Evaluation of Tool-Augmented Dialogue Systems