Grounding and Distinguishing Conceptual Vocabulary Through Similarity Learning in Embodied Simulations