MEDMAX: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
–Neural Information Processing Systems
Recent advancements in mixed-modal generative have opened new avenues for developing unified biomedical assistants capable of analyzing biomedical images, answering complex questions about them, and generating multimodal patient reports. However, existing datasets face challenges such as small sizes, limited coverage of biomedical tasks and domains, and a reliance on narrow sources. To address these gaps, we present MEDMAX, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MEDMAX encompasses a diverse range of tasks, including interleaved image-text generation, biomedical image captioning and generation, visual chat, and report understanding. These tasks span knowledge across diverse biomedical domains, including radiology and histopathology, grounded in medical papers and YouTube videos.
Neural Information Processing Systems
Jun-20-2026, 11:13:34 GMT
- Country:
- North America > United States > California (0.28)
- Genre:
- Instructional Material (0.68)
- Overview (0.67)
- Research Report
- New Finding (1.00)
- Experimental Study (0.67)
- Industry:
- Technology: