AITopics | Gao, Mengxi

Collaborating Authors

Gao, Mengxi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

Dang, Yunkai, Huang, Kaichen, Huo, Jiahao, Yan, Yibo, Huang, Sirui, Liu, Dongrui, Gao, Mengxi, Zhang, Jie, Qian, Chen, Wang, Kun, Liu, Yong, Shao, Jing, Xiong, Hui, Hu, Xuming

arXiv.org Artificial IntelligenceDec-2-2024

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing, respectively. The convergence of these technologies has catalyzed the rise of multimodal AI, enabling richer, cross-modal understanding that spans text, vision, audio, and video modalities. Multimodal large language models (MLLMs), in particular, have emerged as a powerful framework, demonstrating impressive capabilities in tasks like image-text generation, visual question answering, and cross-modal retrieval. Despite these advancements, the complexity and scale of MLLMs introduce significant challenges in interpretability and explainability, essential for establishing transparency, trustworthiness, and reliability in high-stakes applications. This paper provides a comprehensive survey on the interpretability and explainability of MLLMs, proposing a novel framework that categorizes existing research across three perspectives: (I) Data, (II) Model, (III) Training \& Inference. We systematically analyze interpretability from token-level to embedding-level representations, assess approaches related to both architecture analysis and design, and explore training and inference strategies that enhance transparency. By comparing various methodologies, we identify their strengths and limitations and propose future research directions to address unresolved challenges in multimodal explainability. This survey offers a foundational resource for advancing interpretability and transparency in MLLMs, guiding researchers and practitioners toward developing more accountable and robust multimodal AI systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02104

Country:

Europe (0.92)
Asia > China (0.45)

Genre: Overview (1.00)

Industry:

Health & Medicine > Health Care Technology (0.67)
Media (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios

Dang, Yunkai, Gao, Mengxi, Yan, Yibo, Zou, Xin, Gu, Yanggan, Liu, Aiwei, Hu, Xuming

arXiv.org Artificial IntelligenceNov-4-2024

Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs exhibit high response uncertainty when encountering misleading information, requiring even 5-15 response attempts per sample to effectively assess uncertainty. Therefore, we propose a two-stage pipeline: first, we collect MLLMs' responses without misleading information, and then gather misleading ones via specific misleading instructions. Eventually, we establish a Multimodal Uncertainty Benchmark (MUB) that employs both explicit and implicit misleading instructions to comprehensively assess the vulnerability of MLLMs across diverse domains. Our experiments reveal that all opensource and close-source MLLMs are highly susceptible to misleading instructions, with an average misleading rate exceeding 86%. To enhance the robustness of MLLMs, we further fine-tune all ...

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.02708

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback