LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Chen, Wei-Ge, Spiridonova, Irina, Yang, Jianwei, Gao, Jianfeng, Li, Chunyuan

Nov-1-2023–arXiv.org Artificial Intelligence

LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. Importantly, LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction. The development of LLaVA-Interactive is extremely cost-efficient as the system combines three multimodal skills of pre-built AI models without additional model training: visual chat of LLaVA, image segmentation from SEEM, as well as image generation and editing from GLIGEN. A diverse set of application scenarios is presented to demonstrate the promises of LLaVA-Interactive and to inspire future research in multimodal interactive systems.

artificial intelligence, generation and editing, llava-interactive, (3 more...)

arXiv.org Artificial Intelligence

Nov-1-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Artificial Intelligence (0.73)
  - Sensing and Signal Processing > Image Processing (0.53)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found