AITopics | mm-icl

Collaborating Authors

mm-icl

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Neural Information Processing SystemsMar-22-2026, 17:01:52 GMT

Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through introductory instructions in prompts. We hope this study can serve as a foundational guide for optimizing MM-ICL strategies in future research.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.61)

Add feedback

deeb4d6bdb5860fd7faf321dd5486d25-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 10:01:04 GMT

demonstration, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Hunan Province (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Min Li Wanxiang Che School of Computer Science and Engineering, Central South University Research Center for Social Computing and Information Retrieval Harbin Institute of Technology

Neural Information Processing SystemsOct-10-2025, 19:02:55 GMT

" To this end, we investigate extensive experiments on

demonstration, instruction, mm-icl, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.40)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Hunan Province (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Workflow (0.67)

Industry: Automobiles & Trucks > Parts Supplier (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models

Huang, Chengyue, Zhu, Yuchen, Zhu, Sichen, Xiao, Jingyun, Andrade, Moises, Chopra, Shivang, Kira, Zsolt

arXiv.org Artificial IntelligenceJun-10-2025

Vision-language models (VLMs) are widely assumed to exhibit in-context learning (ICL), a property similar to that of their language-only counterparts. While recent work suggests VLMs can perform multimodal ICL (MM-ICL), studies show they often rely on shallow heuristics -- such as copying or majority voting -- rather than true task understanding. We revisit this assumption by evaluating VLMs under distribution shifts, where support examples come from a dataset different from the query. Surprisingly, performance often degrades with more demonstrations, and models tend to copy answers rather than learn from them. To investigate further, we propose a new MM-ICL with Reasoning pipeline that augments each demonstration with a generated rationale alongside the answer. We conduct extensive and comprehensive experiments on both perception- and reasoning-required datasets with open-source VLMs ranging from 3B to 72B and proprietary models such as Gemini 2.0. We conduct controlled studies varying shot count, retrieval method, rationale quality, and distribution. Our results show limited performance sensitivity across these factors, suggesting that current VLMs do not effectively utilize demonstration-level information as intended in MM-ICL.

demonstration, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.07936

Country: Asia (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Neural Information Processing SystemsMay-27-2025, 19:26:54 GMT

Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: "What factors affect the performance of MM-ICL?" To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through introductory instructions in prompts. We hope this study can serve as a foundational guide for optimizing MM-ICL strategies in future research.

demonstration retrieval, in-depth exploration, multi-modal in-context learning, (2 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.65)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.65)

Add feedback

What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration

Qin, Libo, Chen, Qiguang, Fei, Hao, Chen, Zhi, Li, Min, Che, Wanxiang

arXiv.org Artificial IntelligenceOct-27-2024

Recently, rapid advancements in Multi-Modal In-Context Learning (MM-ICL) have achieved notable success, which is capable of achieving superior performance across various tasks without requiring additional parameter tuning. However, the underlying rules for the effectiveness of MM-ICL remain under-explored. To fill this gap, this work aims to investigate the research question: "What factors affect the performance of MM-ICL?'' To this end, we investigate extensive experiments on the three core steps of MM-ICL including demonstration retrieval, demonstration ordering, and prompt construction using 6 vision large language models and 20 strategies. Our findings highlight (1) the necessity of a multi-modal retriever for demonstration retrieval, (2) the importance of intra-demonstration ordering over inter-demonstration ordering, and (3) the enhancement of task comprehension through introductory instructions in prompts. We hope this study can serve as a foundational guide for optimizing MM-ICL strategies in future research.

demonstration, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.20482

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China > Hunan Province (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Zhang, Chaoyi, Lin, Kevin, Yang, Zhengyuan, Wang, Jianfeng, Li, Linjie, Lin, Chung-Ching, Liu, Zicheng, Wang, Lijuan

arXiv.org Artificial IntelligenceNov-29-2023

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD). Unlike previous methods that primarily focused on downstream fine-tuning with short video clips, MM-Narrator excels in generating precise audio descriptions for videos of extensive lengths, even beyond hours, in an autoregressive manner. This capability is made possible by the proposed memory-augmented generation process, which effectively utilizes both the short-term textual context and long-term visual memory through an efficient register-and-recall mechanism. These contextual memories compile pertinent past information, including storylines and character identities, ensuring an accurate tracking and depicting of story-coherent and character-centric audio descriptions. Maintaining the training-free design of MM-Narrator, we further propose a complexity-based demonstration selection strategy to largely enhance its multi-step reasoning capability via few-shot multimodal in-context learning (MM-ICL). Experimental results on MAD-eval dataset demonstrate that MM-Narrator consistently outperforms both the existing fine-tuning-based approaches and LLM-based approaches in most scenarios, as measured by standard evaluation metrics. Additionally, we introduce the first segment-based evaluator for recurrent text generation. Empowered by GPT-4, this evaluator comprehensively reasons and marks AD generation performance in various extendable dimensions.

ad generation, mm-narrator, preprint arxiv, (15 more...)

arXiv.org Artificial Intelligence

2311.17435

Country: North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback