Appendix A Related Work A.1 Multimodal Large Language Models 3 A.2 Trustworthiness of LLMs
–Neural Information Processing Systems
A.1 Multimodal Large Language Models Building on the foundational capabilities of groundbreaking Large Language Models (LLMs) such as GPT [3], PALM [6], Mistral [49], and LLama [108], which excel in language understanding and reasoning, recent innovations have integrated these models with other modalities (especially vision), leading to the development of Multimodal Large Language Models (MLLMs). These advanced MLLMs combine and process visual and textual data, demonstrating enhanced versatility in addressing both traditional vision tasks [21, 40, 42, 133] and complex multimodal challenges [34, 70, 136]. Among all MLLMs, proprietary models consistently perform well. OpenAI's GPT-4-Vision [82] pioneered this space by adeptly handling both text and image content. Anthropic's Claude 3 series [7] integrates advanced vision capabilities and multilingual support, enhancing its application across diverse cognitive and real-time tasks.
Neural Information Processing Systems
May-24-2025, 14:32:12 GMT
- Country:
- Africa > Middle East
- Egypt (0.14)
- Asia > China (0.28)
- Europe > Italy (0.27)
- North America > United States (0.45)
- Africa > Middle East
- Genre:
- Overview (0.92)
- Research Report (1.00)
- Industry:
- Energy > Renewable (0.67)
- Government (0.93)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (0.92)
- Leisure & Entertainment > Sports (0.92)
- Transportation (0.92)
- Technology: