AITopics | Gao, Yunhe

Collaborating Authors

Gao, Yunhe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Li, Zhuowei, Shi, Haizhou, Gao, Yunhe, Liu, Di, Wang, Zhenting, Chen, Yuxiao, Liu, Ting, Zhao, Long, Wang, Hao, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceFeb-5-2025

Large Vision-Language Models (LVLMs) can reason effectively over both textual and visual inputs, but they tend to hallucinate syntactically coherent yet visually ungrounded contents. In this paper, we investigate the internal dynamics of hallucination by examining the tokens logits rankings throughout the generation process, revealing three key patterns in how LVLMs process information: (1) gradual visual information loss -- visually grounded tokens gradually become less favored throughout generation, and (2) early excitation -- semantically meaningful tokens achieve peak activation in the layers earlier than the final layer. (3) hidden genuine information -- visually grounded tokens though not being eventually decided still retain relatively high rankings at inference. Based on these insights, we propose VISTA (Visual Information Steering with Token-logit Augmentation), a training-free inference-time intervention framework that reduces hallucination while promoting genuine information. VISTA works by combining two complementary approaches: reinforcing visual information in activation space and leveraging early layer activations to promote semantically meaningful decoding. Compared to existing methods, VISTA requires no external supervision and is applicable to various decoding strategies. Extensive experiments show that VISTA on average reduces hallucination by abount 40% on evaluated open-ended generation task, and it consistently outperforms existing methods on four benchmarks across four architectures under three decoding strategies.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.03628

Country:

Asia (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports (1.00)
Transportation > Ground (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment

Gu, Difei, Gao, Yunhe, Zhou, Yang, Zhou, Mu, Metaxas, Dimitris

arXiv.org Artificial IntelligenceJan-13-2025

Automated chest radiographs interpretation requires both accurate disease classification and detailed radiology report generation, presenting a significant challenge in the clinical workflow. Current approaches either focus on classification accuracy at the expense of interpretability or generate detailed but potentially unreliable reports through image captioning techniques. In this study, we present RadAlign, a novel framework that combines the predictive accuracy of vision-language models (VLMs) with the reasoning capabilities of large language models (LLMs). Inspired by the radiologist's workflow, RadAlign first employs a specialized VLM to align visual features with key medical concepts, achieving superior disease classification with an average AUC of 0.885 across multiple diseases. These recognized medical conditions, represented as text-based concepts in the aligned visual-language space, are then used to prompt LLM-based report generation. Enhanced by a retrieval-augmented generation mechanism that grounds outputs in similar historical cases, RadAlign delivers superior report quality with a GREEN score of 0.678, outperforming state-of-the-art methods' 0.634. Our framework maintains strong clinical interpretability while reducing hallucinations, advancing automated medical imaging and report analysis through integrated predictive and generative AI. Code is available at https://github.com/difeigu/RadAlign.

large language model, machine learning, radalign, (19 more...)

arXiv.org Artificial Intelligence

2501.07525

Country: Europe > Spain (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)

Add feedback

VerSe: Integrating Multiple Queries as Prompts for Versatile Cardiac MRI Segmentation

Guo, Bangwei, Ye, Meng, Gao, Yunhe, Xin, Bingyu, Axel, Leon, Metaxas, Dimitris

arXiv.org Artificial IntelligenceDec-20-2024

Despite the advances in learning-based image segmentation approach, the accurate segmentation of cardiac structures from magnetic resonance imaging (MRI) remains a critical challenge. While existing automatic segmentation methods have shown promise, they still require extensive manual corrections of the segmentation results by human experts, particularly in complex regions such as the basal and apical parts of the heart. Recent efforts have been made on developing interactive image segmentation methods that enable human-in-the-loop learning. However, they are semi-automatic and inefficient, due to their reliance on click-based prompts, especially for 3D cardiac MRI volumes. To address these limitations, we propose VerSe, a Versatile Segmentation framework to unify automatic and interactive segmentation through mutiple queries. Our key innovation lies in the joint learning of object and click queries as prompts for a shared segmentation backbone. VerSe supports both fully automatic segmentation, through object queries, and interactive mask refinement, by providing click queries when needed. With the proposed integrated prompting scheme, VerSe demonstrates significant improvement in performance and efficiency over existing methods, on both cardiac MRI and out-of-distribution medical imaging datasets. The code is available at https://github.com/bangwayne/Verse.

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2412.16381

Country:

Europe > France (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

Gao, Yunhe, Gu, Difei, Zhou, Mu, Metaxas, Dimitris

arXiv.org Artificial IntelligenceJun-8-2024

Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective framework, Explicd, towards Explainable language-informed criteria-based diagnosis. Explicd initiates its process by querying domain knowledge from either large language models (LLMs) or human experts to establish diagnostic criteria across various concept axes (e.g., color, shape, texture, or specific patterns of diseases). By leveraging a pretrained vision-language model, Explicd injects these criteria into the embedding space as knowledge anchors, thereby facilitating the learning of corresponding visual concepts within medical images. The final diagnostic outcome is determined based on the similarity scores between the encoded visual concepts and the textual criteria embeddings. Through extensive evaluation of five medical image classification benchmarks, Explicd has demonstrated its inherent explainability and extends to improve classification performance compared to traditional black-box models.

criteria, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.05596

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.72)
Instructional Material > Online (0.62)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Implicit In-context Learning

Li, Zhuowei, Xu, Zihao, Han, Ligong, Gao, Yunhe, Wen, Song, Liu, Di, Wang, Hao, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceMay-23-2024

In-context Learning (ICL) empowers large language models (LLMs) to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is susceptible to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space. I2CL first generates a condensed vector representation, namely a context vector, from the demonstration examples. It then integrates the context vector during inference by injecting a linear combination of the context vector and query activations into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples. Furthermore, I2CL facilitates a novel representation of "task-ids", enhancing task similarity detection and enabling effective transfer learning. We provide a comprehensive analysis of I2CL, offering deeper insights into its mechanisms and broader implications for ICL. The source code is available at: https://github.com/LzVv123456/I2CL.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.1466

Country:

Europe (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

Liu, Di, Zhao, Long, Zhangli, Qilong, Gao, Yunhe, Liu, Ting, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceSep-2-2023

The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects. Recent methods learn to represent an object shape using a set of simple primitives to fit the target. \textcolor{black}{However, in these methods, the primitives used do not always correspond to real parts or lack geometric flexibility for semantic interpretation.} In this paper, we investigate salient and efficient primitive descriptors for accurate shape abstractions, and propose \textit{Deep Deformable Models (DDMs)}. DDM employs global deformations and diffeomorphic local deformations. These properties enable DDM to abstract complex object shapes with significantly fewer primitives that offer broader geometry coverage and finer details. DDM is also capable of learning part-level semantic correspondences due to the differentiable and invertible properties of our primitive deformation. Moreover, DDM learning formulation is based on dynamic and kinematic modeling, which enables joint regularization of each sub-transformation during primitive fitting. Extensive experiments on \textit{ShapeNet} demonstrate that DDM outperforms the state-of-the-art in terms of reconstruction and part consistency by a notable margin.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2309.01035

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback