AITopics | Zhou, James

Collaborating Authors

Zhou, James

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Epistemic Integrity in Large Language Models

Ghafouri, Bijean, Mohammadzadeh, Shahrad, Zhou, James, Nair, Pratheeksha, Tian, Jacob-Junqi, Goel, Mayank, Rabbany, Reihaneh, Godbout, Jean-François, Pelrine, Kellin

arXiv.org Artificial IntelligenceNov-10-2024

Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration -- where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models (LLMs) which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty LLMs hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing this miscalibration, offering a path towards correcting it and more trustworthy AI across domains. Large Language Models (LLMs) have markedly transformed how humans seek and consume information, becoming integral across diverse fields such as public health (Ali et al., 2023), coding (Zambrano et al., 2023), and education (Whalen & et al., 2023). Despite their growing influence, LLMs are not without shortcomings. One notable issue is the potential for generating responses that, while convincing, may be inaccurate or nonsensical--a long-standing phenomenon often referred to as "hallucinations" (Jo, 2023; Huang et al., 2023; Zhou et al., 2024b). This raises concerns about the reliability and trustworthiness of these models. A critical aspect of trustworthiness in LLMs is epistemic calibration, which represents the alignment between a model's internal confidence in its outputs and the way it expresses that confidence through natural language. Misalignment between internal certainty and external expression can lead to users being misled by overconfident or underconfident statements, posing significant risks in high-stakes domains such as legal advice, medical diagnosis, and misinformation detection. While of great normative concern, how LLMs express linguistic uncertainty has received relatively little attention to date (Sileo & Moens, 2023; Belem et al., 2024). Figures 1 and 5 illustrate the issue of epistemic calibration providing insights into the operation of certainty in the context of human interactions with LLMs. Distinct Roles of Certainty: Internal certainty and linguistic assertiveness have distinct functions within LLM interactions that shape individual beliefs. Human access to LLM certainty: Linguistic assertiveness holds a critical role as the primary form of certainty available to users.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.06528

Country:

North America > United States (1.00)
Asia (1.00)
North America > Canada > Quebec (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Media > News (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model

Nguyen, Duy M. H., Diep, Nghiem T., Nguyen, Trung Q., Le, Hoang-Bao, Nguyen, Tai, Nguyen, Tien, Nguyen, TrungTin, Ho, Nhat, Xie, Pengtao, Wattenhofer, Roger, Zhou, James, Sonntag, Daniel, Niepert, Mathias

arXiv.org Artificial IntelligenceOct-6-2024

State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02615

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback