The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning