Visual Perception by Large Language Model's Weights

Neural Information Processing Systems 

In this way, the input of LLM does not require visual tokens, which reduces the length of the input sequence and greatly improves efficiency. Following this paradigm, we propose VLoRA with the perceptual weights generator.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found