Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning Alex Jinpeng Wang

Oct-9-2025, 19:55:22 GMT–Neural Information Processing Systems

Training models with longer in-context lengths is a significant challenge for multi-modal machine learning due to substantial GPU memory and computational costs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Oct-9-2025, 19:55:22 GMT

Conferences PDF

Country:
- Asia > Singapore (0.04)
- Europe
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (1.00)
  - Promising Solution (0.67)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)
  - Natural Language > Large Language Model (1.00)
  - Vision (1.00)