Learning Representations for Pixel-based Control: What Matters and Why?