Many-Shot In-Context Learning in Multimodal Foundation Models