Learning Multimodal LLMs without Text-only Forgetting

Oct-9-2025, 23:25:57 GMT–Neural Information Processing Systems

The LoRRA mirrors the structure of attention but utilizes low-rank connections to ensure efficiency. Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements.

arxiv preprint arxiv, language model, learner, (15 more...)

Neural Information Processing Systems

Oct-9-2025, 23:25:57 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Missouri > St. Louis County
    - St. Louis (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Asia > China
  - Jiangsu Province > Nanjing (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
3852f6d247ba7deb46e4e4be9e702601-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found