Goto

Collaborating Authors

 Large Language Model





Lumina-Next: MakingLumina-T2X StrongerandFasterwithNext-DiT

Neural Information Processing Systems

Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers (Flag-DiT) that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions.




VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Neural Information Processing Systems

Hence, we consider asking a VLM to provide the scientific name of the organism shown in a given image. There are two types of questions that we consider for this task. First, we consider open-ended questions, where we do not provide any answer choices (or options) to the VLM in the input prompt.