AITopics | Zhong, Cheng

Collaborating Authors

Zhong, Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CODA: Repurposing Continuous VAEs for Discrete Tokenization

Liu, Zeyu, Ni, Zanlin, Hua, Yeguo, Deng, Xin, Ma, Xiao, Zhong, Cheng, Huang, Gao

arXiv.org Artificial IntelligenceMar-22-2025

Discrete visual tokenizers transform images into a sequence of tokens, enabling token-based visual generation akin to language models. However, this process is inherently challenging, as it requires both compressing visual signals into a compact representation and discretizing them into a fixed set of codes. Traditional discrete tokenizers typically learn the two tasks jointly, often leading to unstable training, low codebook utilization, and limited reconstruction quality. In this paper, we introduce \textbf{CODA}(\textbf{CO}ntinuous-to-\textbf{D}iscrete \textbf{A}daptation), a framework that decouples compression and discretization. Instead of training discrete tokenizers from scratch, CODA adapts off-the-shelf continuous VAEs -- already optimized for perceptual compression -- into discrete tokenizers via a carefully designed discretization process. By primarily focusing on discretization, CODA ensures stable and efficient training while retaining the strong visual fidelity of continuous VAEs. Empirically, with $\mathbf{6 \times}$ less training budget than standard VQGAN, our approach achieves a remarkable codebook utilization of 100% and notable reconstruction FID (rFID) of $\mathbf{0.43}$ and $\mathbf{1.34}$ for $8 \times$ and $16 \times$ compression on ImageNet 256$\times$ 256 benchmark.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1776

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

He, Tianyao, Liu, Huabin, Li, Yuxi, Ma, Xiao, Zhong, Cheng, Zhang, Yang, Lin, Weiyao

arXiv.org Artificial IntelligenceDec-18-2023

Video Correlation Learning (VCL), which aims to analyze the relationships between videos, has been widely studied and applied in various general video tasks. However, applying VCL to instructional videos is still quite challenging due to their intrinsic procedural temporal structure. Specifically, procedural knowledge is critical for accurate correlation analyses on instructional videos. Nevertheless, current procedure-learning methods heavily rely on step-level annotations, which are costly and not scalable. To address this problem, we introduce a weakly supervised framework called Collaborative Procedure Alignment (CPA) for procedure-aware correlation learning on instructional videos. Our framework comprises two core modules: collaborative step mining and frame-to-step alignment. The collaborative step mining module enables simultaneous and consistent step segmentation for paired videos, leveraging the semantic and temporal similarity between frames. Based on the identified steps, the frame-to-step alignment module performs alignment between the frames and steps across videos. The alignment result serves as a measurement of the correlation distance between two videos. We instantiate our framework in two distinct instructional video tasks: sequence verification and action quality assessment. Extensive experiments validate the effectiveness of our approach in providing accurate and interpretable correlation analyses for instructional videos.

artificial intelligence, image understanding, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.11024

Country: Asia > China (0.28)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Media (1.00)
Education > Educational Technology > Audio & Video (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.41)

Add feedback

Hierarchical Reinforcement Learning for Automatic Disease Diagnosis

Zhong, Cheng, Liao, Kangenbei, Chen, Wei, Liu, Qianlong, Peng, Baolin, Huang, Xuanjing, Peng, Jiajie, Wei, Zhongyu

arXiv.org Artificial IntelligenceNov-7-2023

Motivation: Disease diagnosis oriented dialogue system models the interactive consultation procedure as Markov Decision Process and reinforcement learning algorithms are used to solve the problem. Existing approaches usually employ a flat policy structure that treat all symptoms and diseases equally for action making. This strategy works well in the simple scenario when the action space is small, however, its efficiency will be challenged in the real environment. Inspired by the offline consultation process, we propose to integrate a hierarchical policy structure of two levels into the dialogue systemfor policy learning. The high-level policy consists of amastermodel that is responsible for triggering a low-levelmodel, the lowlevel policy consists of several symptom checkers and a disease classifier. The proposed policy structure is capable to deal with diagnosis problem including large number of diseases and symptoms. Results: Experimental results on three real-world datasets and a synthetic dataset demonstrate that our hierarchical framework achieves higher accuracy and symptom recall in disease diagnosis compared with existing systems. We construct a benchmark including datasets and implementation of existing algorithms to encourage follow-up researches. Availability: The code and data is available from https://github.com/FudanDISC/DISCOpen-MedBox-DialoDiagnosis Contact: 21210980124@m.fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

machine learning, natural language, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2004.14254

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Public Health (0.68)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

Chen, Wei, Li, Zhiwei, Fang, Hongyi, Yao, Qianyuan, Zhong, Cheng, Hao, Jianye, Zhang, Qi, Huang, Xuanjing, Peng, Jiajie, Wei, Zhongyu

arXiv.org Artificial IntelligenceDec-25-2022

Motivation: In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level finegrained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. Results: We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2204.08997

Country:

Europe (0.68)
Asia > China (0.47)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (0.93)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dense Symptom Representations

Chen, Wei, Zhong, Cheng, Peng, Jiajie, Wei, Zhongyu

arXiv.org Artificial IntelligenceDec-23-2022

Diagnosis-oriented dialogue system queries the patient's health condition and makes predictions about possible diseases through continuous interaction with the patient. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods cannot achieve sufficiently good prediction accuracy, still far from its upper limit. To address the problem, we propose a decoupled automatic diagnostic framework DxFormer, which divides the diagnosis process into two steps: symptom inquiry and disease diagnosis, where the transition from symptom inquiry to disease diagnosis is explicitly determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model respectively. We use the inverted version of Transformer, i.e., the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross entropy loss. Extensive experiments on three public real-world datasets prove that our proposed model can effectively learn doctors' clinical experience and achieve the state-of-the-art results in terms of symptom recall and diagnostic accuracy.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2205.03755

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine (0.88)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback