Learning Fused State Representations for Control from Multi-View Observations