DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

Wang, Fang, Yan, Tianwei, Yang, Zonghao, Hu, Minghao, Zhang, Jun, Luo, Zhunchen, Bai, Xiaoying

Aug-25-2025–arXiv.org Artificial Intelligence

Entity linking is a fundamental task in knowledge graph (KG) construction Hofer et al. (2024), aiming to link mentions to their corresponding entities in a target knowledge base (KB). It is widely applied in downstream natural language processing (NLP) tasks, such as Question & Answering Systems Sequeda et al. (2024) and intelligent recommendation systems Chaudhari et al. (2017). Recently, the explosive growth of multimodal data on the Internet has raised challenges, as the quality of online information is often inconsistent, many mentions are ambiguous, and contextual information is frequently incomplete. Under such conditions, relying solely on a single modality (such as pure text) is often insufficient to accurately resolve reference ambiguity Gan et al. (2021). Integrating textual and visual modalities can significantly improve the precision and efficiency of disambiguation Gella et al. (2017). Consequently, multimodal entity linking, which involves combining textual and visual information to link real-world mentions to corresponding entities in a multimodal knowledge graph (MMKG), has become a critical research task. For example, as shown in Figure 1, the mention of "Apple" may be difficult to disambiguate, as it could refer to various entities, such as Apple Inc. or the apple (fruit). However, by considering both textual and visual information, it becomes easier and clearer to accurately link the mention of "Apple" to the entity "apple (fruit of the apple tree)." Currently, multimodal entity linking models are primarily based on deep learning frameworks, utilizing cross-attention mechanisms Lu and Elhamifar (2024) and visual feature encoding techniques Mokssit et al. (2023) to achieve the fusion of textual mentions and visual information.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

Aug-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (1.00)
- South America (0.67)
- North America > United States
  - California (0.46)

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Text Processing (0.94)
    - Question Answering (0.86)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found