AITopics | Wang, Guoxin

Collaborating Authors

Wang, Guoxin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

Wang, Guoxin, Gao, Minyu, Yang, Shuai, Zhang, Ya, He, Lizhi, Huang, Liang, Xiao, Hanlin, Zhang, Yexuan, Li, Wanyue, Chen, Lu, Fei, Jintao, Li, Xin

arXiv.org Artificial IntelligenceFeb-25-2025

Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications. However, their deployment in healthcare, especially in disease reasoning tasks, is hindered by the challenge of acquiring expert-level cognitive data. In this paper, we introduce Citrus, a medical language model that bridges the gap between clinical expertise and AI reasoning by emulating the cognitive processes of medical experts. The model is trained on a large corpus of simulated expert disease reasoning data, synthesized using a novel approach that accurately captures the decision-making pathways of clinicians. This approach enables Citrus to better simulate the complex reasoning processes involved in diagnosing and treating medical conditions. To further address the lack of publicly available datasets for medical reasoning tasks, we release the last-stage training data, including a custom-built medical diagnostic dialogue dataset. This open-source contribution aims to support further research and development in the field. Evaluations using authoritative benchmarks such as MedQA, covering tasks in medical reasoning and language understanding, show that Citrus achieves superior performance compared to other models of similar size. These results highlight Citrus potential to significantly enhance medical decision support systems, providing a more accurate and efficient tool for clinical decision-making.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.18274

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)

Add feedback

Kosmos-2.5: A Multimodal Literate Model

Lv, Tengchao, Huang, Yupan, Chen, Jingye, Cui, Lei, Ma, Shuming, Chang, Yaoyao, Huang, Shaohan, Wang, Wenhui, Dong, Li, Luo, Weiyao, Wu, Shaoxiang, Wang, Guoxin, Zhang, Cha, Wei, Furu

arXiv.org Artificial IntelligenceSep-20-2023

We present Kosmos-2.5, a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format. This unified multimodal literate capability is achieved through a shared Transformer architecture, task-specific prompts, and flexible text representations. We evaluate Kosmos-2.5 on end-to-end document-level text recognition and image-to-markdown text generation. Furthermore, the model can be readily adapted for any text-intensive image understanding task with different prompts through supervised fine-tuning, making it a general-purpose tool for real-world applications involving text-rich images. This work also paves the way for the future scaling of multimodal large language models.

large language model, multimodal literate model, natural language, (1 more...)

arXiv.org Artificial Intelligence

2309.11419

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Unifying Vision, Text, and Layout for Universal Document Processing

Tang, Zineng, Yang, Ziyi, Wang, Guoxin, Fang, Yuwei, Liu, Yang, Zhu, Chenguang, Zeng, Michael, Zhang, Cha, Bansal, Mohit

arXiv.org Artificial IntelligenceMar-13-2023

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 8 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.02623

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback