CogVLM: Visual Expert for Pretrained Language Models
–Neural Information Processing Systems
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables a deep fusion of vision language features without sacrificing any performance on NLP tasks.
Neural Information Processing Systems
Jun-2-2025, 03:09:27 GMT
- Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Education (0.46)
- Health & Medicine (0.46)
- Technology: