Towards Robust Metrics for Concept Representation Evaluation