AITopics | Wang, Zhiyuan

Collaborating Authors

Wang, Zhiyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

Wang, Zhiyuan, Duan, Jinhao, Yuan, Chenxi, Chen, Qingyu, Chen, Tianlong, Yao, Huaxiu, Zhang, Yue, Wang, Ren, Xu, Kaidi, Shi, Xiaoshuang

arXiv.org Artificial IntelligenceFeb-21-2024

Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the primary source of uncertainty due to the presence of generative inequality. In this paper, we propose the Word-Sequence Entropy (WSE), which calibrates the uncertainty proportion at both the word and sequence levels according to the semantic relevance, with greater emphasis placed on keywords and more relevant sequences when performing uncertainty quantification. We compare WSE with 6 baseline methods on 5 free-form medical QA datasets, utilizing 7 "off-the-shelf" large language models (LLMs), and show that WSE exhibits superior performance on accurate uncertainty measurement under two standard criteria for correctness evaluation (e.g., WSE outperforms existing state-of-the-art method by 3.23% AUROC on the MedQA dataset). Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement in the performance of LLMs when employing sequences with lower uncertainty, identified by WSE, as final answers (e.g., +6.36% accuracy improvement on the COVID-QA dataset), without requiring any additional task-specific fine-tuning or architectural modifications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.14259

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Illinois (0.14)
Asia > China > Sichuan Province (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning

Wang, Zhiyuan, Qu, Xiaoyang, Xiao, Jing, Chen, Bokui, Wang, Jianzong

arXiv.org Artificial IntelligenceJan-21-2024

This paper introduces INCPrompt, an innovative continual learning solution that effectively addresses catastrophic forgetting. INCPrompt's key innovation lies in its use of adaptive key-learner and task-aware prompts that capture task-relevant information. This unique combination encapsulates general knowledge across tasks and encodes task-specific knowledge. Our comprehensive evaluation across multiple continual learning benchmarks demonstrates INCPrompt's superiority over existing algorithms, showing its effectiveness in mitigating catastrophic forgetting while maintaining high performance. These results highlight the significant impact of task-aware incremental prompting on continual learning performance.

artificial intelligence, incprompt, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2401.11667

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report (0.40)

Industry: Education > Educational Setting (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

Wang, Zhiyuan, Qu, Xiaoyang, Xiao, Jing, Chen, Bokui, Wang, Jianzong

arXiv.org Artificial IntelligenceJan-21-2024

Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2401.11666

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report > Promising Solution (0.49)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models

Jiang, Xinke, Zhang, Ruizhe, Xu, Yongxin, Qiu, Rihong, Fang, Yue, Wang, Zhiyuan, Tang, Jinyi, Ding, Hongxin, Chu, Xu, Zhao, Junfeng, Wang, Yasha

arXiv.org Artificial IntelligenceDec-25-2023

We explore how the rise of Large Language Models (LLMs) significantly impacts task performance in the field of Natural Language Processing. We focus on two strategies, Retrieval-Augmented Generation (RAG) and Fine-Tuning (FT), and propose the Hypothesis Knowledge Graph Enhanced (HyKGE) framework, leveraging a knowledge graph to enhance medical LLMs. By integrating LLMs and knowledge graphs, HyKGE demonstrates superior performance in addressing accuracy and interpretability challenges, presenting potential applications in the medical domain. Our evaluations using real-world datasets highlight HyKGE's superiority in providing accurate knowledge with precise confidence, particularly in complex and difficult scenarios. The code will be available until published.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.15883

Country: North America > United States (0.31)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores

Zhang, Zeliang, Liu, Zhuo, Liang, Susan, Wang, Zhiyuan, Zhu, Yifan, Ding, Chen, Xu, Chenliang

arXiv.org Artificial IntelligenceNov-22-2023

CP decomposition is a powerful tool for data science, especially gene analysis, deep learning, and quantum computation. However, the application of tensor decomposition is largely hindered by the exponential increment of the computational complexity and storage consumption with the size of tensors. While the data in our real world is usually presented as trillion- or even exascale-scale tensors, existing work can only support billion-scale scale tensors. In our work, we propose the Exascale-Tensor to mitigate the significant gap. Specifically, we propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition. Then, we carefully analyze the inherent parallelism and propose a bag of strategies to improve computational efficiency. Last, we conduct experiments to decompose tensors ranging from million-scale to trillion-scale for evaluation. Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x. We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks, of which the numeric results demonstrate the scalability and effectiveness of our method.

artificial intelligence, machine learning, tensor, (16 more...)

arXiv.org Artificial Intelligence

2311.13693

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Efficient Semi-Supervised Federated Learning for Heterogeneous Participants

Sun, Zhipeng, Xu, Yang, Xu, Hongli, Wang, Zhiyuan, Liao, Yunming

arXiv.org Artificial IntelligenceNov-10-2023

Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large-scale models on resource-constrained clients is challenging. Fortunately, Split Federated Learning (SFL) offers a feasible solution by alleviating the computation and/or communication burden on clients. However, existing SFL works often assume sufficient labeled data on clients, which is usually impractical. Besides, data non-IIDness across clients poses another challenge to ensure efficient model training. To our best knowledge, the above two issues have not been simultaneously addressed in SFL. Herein, we propose a novel Semi-SFL system, which incorporates clustering regularization to perform SFL under the more practical scenario with unlabeled and non-IID client data. Moreover, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data have an influence on the effectiveness of clustering regularization. To this end, we develop a control algorithm for dynamically adjusting the global updating frequency, so as to mitigate the training inconsistency and improve training performance. Extensive experiments on benchmark models and datasets show that our system provides a 3.0x speed-up in training time and reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.1% improvement in accuracy under non-IID scenarios compared to the state-of-the-art baselines.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.1587

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A ModelOps-based Framework for Intelligent Medical Knowledge Extraction

Ding, Hongxin, Zou, Peinie, Wang, Zhiyuan, Zhao, Junfeng, Wang, Yasha, Zhou, Qiang

arXiv.org Artificial IntelligenceOct-4-2023

Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extraction. To address these issues, we propose a ModelOps-based intelligent medical knowledge extraction framework that offers a low-code system for model selection, training, evaluation and optimization. Specifically, the framework includes a dataset abstraction mechanism based on multi-layer callback functions, a reusable model training, monitoring and management mechanism. We also propose a model recommendation method based on dataset similarity, which helps users quickly find potentially suitable models for a given dataset. Our framework provides convenience for researchers to develop models and simplifies model access for non-AI experts such as doctors.

artificial intelligence, intelligent medical knowledge extraction, modelop-based framework

arXiv.org Artificial Intelligence

2310.02593

Genre: Research Report (0.40)

Industry: Health & Medicine (0.53)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Applied AI (0.60)

Add feedback

A Novel Noise Injection-based Training Scheme for Better Model Robustness

Zhang, Zeliang, Jiang, Jinyang, Chen, Minjie, Wang, Zhiyuan, Peng, Yijie, Yu, Zhaofei

arXiv.org Artificial IntelligenceMay-29-2023

Noise injection-based method has been shown to be able to improve the robustness of artificial neural networks in previous work. In this work, we propose a novel noise injection-based training scheme for better model robustness. Specifically, we first develop a likelihood ratio method to estimate the gradient with respect to both synaptic weights and noise levels for stochastic gradient descent training. Then, we design an approximation for the vanilla noise injection-based training method to reduce memory and improve computational efficiency. Next, we apply our proposed scheme to spiking neural networks and evaluate the performance of classification accuracy and robustness on MNIST and Fashion-MNIST datasets. Experiment results show that our proposed method achieves a much better performance on adversarial robustness and slightly better performance on original accuracy, compared with the conventional gradient-based training method.

adversarial attack, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.10802

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.96)
Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Personalized State Anxiety Detection: An Empirical Study with Linguistic Biomarkers and A Machine Learning Pipeline

Wang, Zhiyuan, Tang, Mingyue, Larrazabal, Maria A., Toner, Emma R., Rucker, Mark, Wu, Congyu, Teachman, Bethany A., Boukhechba, Mehdi, Barnes, Laura E.

arXiv.org Artificial IntelligenceApr-19-2023

Individuals high in social anxiety symptoms often exhibit elevated state anxiety in social situations. Research has shown it is possible to detect state anxiety by leveraging digital biomarkers and machine learning techniques. However, most existing work trains models on an entire group of participants, failing to capture individual differences in their psychological and behavioral responses to social contexts. To address this concern, in Study 1, we collected linguistic data from N=35 high socially anxious participants in a variety of social contexts, finding that digital linguistic biomarkers significantly differ between evaluative vs. non-evaluative social contexts and between individuals having different trait psychological symptoms, suggesting the likely importance of personalized approaches to detect state anxiety. In Study 2, we used the same data and results from Study 1 to model a multilayer personalized machine learning pipeline to detect state anxiety that considers contextual and individual differences. This personalized model outperformed the baseline F1-score by 28.0%. Results suggest that state anxiety can be more accurately detected with personalized machine learning approaches, and that linguistic biomarkers hold promise for identifying periods of state anxiety in an unobtrusive way.

artificial intelligence, machine learning, state anxiety, (15 more...)

arXiv.org Artificial Intelligence

2304.09928

Country: North America > United States > Virginia (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Adaptive Control of Client Selection and Gradient Compression for Efficient Federated Learning

Jiang, Zhida, Xu, Yang, Xu, Hongli, Wang, Zhiyuan, Qian, Chen

arXiv.org Artificial IntelligenceDec-19-2022

Federated learning (FL) allows multiple clients cooperatively train models without disclosing local data. However, the existing works fail to address all these practical concerns in FL: limited communication resources, dynamic network conditions and heterogeneous client properties, which slow down the convergence of FL. To tackle the above challenges, we propose a heterogeneity-aware FL framework, called FedCG, with adaptive client selection and gradient compression. Specifically, the parameter server (PS) selects a representative client subset considering statistical heterogeneity and sends the global model to them. After local training, these selected clients upload compressed model updates matching their capabilities to the PS for aggregation, which significantly alleviates the communication load and mitigates the straggler effect. We theoretically analyze the impact of both client selection and gradient compression on convergence performance. Guided by the derived convergence rate, we develop an iteration-based algorithm to jointly optimize client selection and compression ratio decision using submodular maximization and linear programming. Extensive experiments on both real-world prototypes and simulations show that FedCG can provide up to 5.3$\times$ speedup compared to other methods.

artificial intelligence, client selection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.09483

Genre: Research Report (1.00)

Industry:

Education (0.66)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback