On the Pitfalls of Measuring Emergent Communication