Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding

Neural Information Processing Systems 

A mainstream of Multi-modal Large Language Models (MLLMs) have two essential functions, i.e., visual recognition ( e.g., grounding) and understanding ( e.g.,

Similar Docs  Excel Report  more

TitleSimilaritySource
None found