AITopics | Guo, Zhicheng

Collaborating Authors

Guo, Zhicheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time

Donnelly, Jon, Guo, Zhicheng, Barnett, Alina Jade, McTavish, Hayden, Chen, Chaofan, Rudin, Cynthia

arXiv.org Artificial IntelligenceMar-2-2025

Interpretability is critical for machine learning models in high-stakes settings because it allows users to verify the model's reasoning. In computer vision, prototypical part models (ProtoPNets) have become the dominant model type to meet this need. Users can easily identify flaws in ProtoPNets, but fixing problems in a ProtoPNet requires slow, difficult retraining that is not guaranteed to resolve the issue. This problem is called the "interaction bottleneck." We solve the interaction bottleneck for ProtoPNets by simultaneously finding many equally good ProtoPNets (i.e., a draw from a "Rashomon set"). We show that our framework - called Proto-RSet - quickly produces many accurate, diverse ProtoPNets, allowing users to correct problems in real time while maintaining performance guarantees with respect to the training set. We demonstrate the utility of this method in two settings: 1) removing synthetic bias introduced to a bird identification model and 2) debugging a skin cancer identification model. This tool empowers non-machine-learning experts, such as clinicians or domain experts, to quickly refine and correct machine learning models without repeated retraining by machine learning experts.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.01087

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Government > Regional Government (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.48)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Health & Medicine > Therapeutic Area > Dermatology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs

Yu, Yuanqing, Wang, Zhefan, Ma, Weizhi, Guo, Zhicheng, Zhan, Jingtao, Wang, Shuai, Wu, Chuhan, Guo, Zhiqiang, Zhang, Min

arXiv.org Artificial IntelligenceNov-25-2024

Despite having powerful reasoning and inference capabilities, Large Language Models (LLMs) still need external tools to acquire real-time information retrieval or domain-specific expertise to solve complex tasks, which is referred to as tool learning. Existing tool learning methods primarily rely on tuning with expert trajectories, focusing on token-sequence learning from a linguistic perspective. However, there are several challenges: 1) imitating static trajectories limits their ability to generalize to new tasks. 2) even expert trajectories can be suboptimal, and better solution paths may exist. In this work, we introduce StepTool, a novel step-grained reinforcement learning framework to improve tool learning in LLMs. It consists of two components: Step-grained Reward Shaping, which assigns rewards at each tool interaction based on tool invocation success and its contribution to the task, and Step-grained Optimization, which uses policy gradient methods to optimize the model in a multi-step manner. Experimental results demonstrate that StepTool significantly outperforms existing methods in multi-step, tool-based tasks, providing a robust solution for complex task environments. Codes are available at https://github.com/yuyq18/StepTool.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.07745

Genre: Research Report > New Finding (0.48)

Industry: Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

Guo, Zhicheng, Cheng, Sijie, Wang, Hao, Liang, Shihao, Qin, Yujia, Li, Peng, Liu, Zhiyuan, Sun, Maosong, Liu, Yang

arXiv.org Artificial IntelligenceJun-19-2024

Large Language Models (LLMs) have witnessed remarkable advancements in recent years, prompting the exploration of tool learning, which integrates LLMs with external tools to address diverse real-world challenges. Assessing the capability of LLMs to utilise tools necessitates large-scale and stable benchmarks. However, previous works relied on either hand-crafted online tools with limited scale, or large-scale real online APIs suffering from instability of API status. To address this problem, we introduce StableToolBench, a benchmark evolving from ToolBench, proposing a virtual API server and stable evaluation system. The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status. Meanwhile, the stable evaluation system designs solvable pass and win rates using GPT-4 as the automatic evaluator to eliminate the randomness during evaluation. Experimental results demonstrate the stability of StableToolBench, and further discuss the effectiveness of API simulators, the caching system, and the evaluator system.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.07714

Country: Asia > China (0.68)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

Ding, Cheng, Guo, Zhicheng, Chen, Zhaoliang, Lee, Randall J, Rudin, Cynthia, Hu, Xiao

arXiv.org Artificial IntelligenceApr-26-2024

Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for developing foundation models for physiological data; such data are often noisy, incomplete, or inconsistent. The present work aims to provide a toolset for developing foundation models on physiological data. We leverage a large dataset of photoplethysmography (PPG) signals from hospitalized intensive care patients. For this data, we propose SimQuality, a novel self-supervised learning task based on convolutional neural networks (CNNs) as the backbone to enforce representations to be similar for good and poor quality signals that are from similar physiological states. We pre-trained the SimQuality on over 36 million 30-second PPG pairs and then fine-tuned and tested on six downstream tasks using external datasets. The results demonstrate the superiority of the proposed approach on all the downstream tasks, which are extremely important for heart monitoring on wearable devices. Our method indicates that CNNs can be an effective backbone for foundation models that are robust to training data quality.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.17667

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

What is different between these datasets?

Babbar, Varun, Guo, Zhicheng, Rudin, Cynthia

arXiv.org Artificial IntelligenceMar-8-2024

The performance of machine learning models heavily depends on the quality of input data, yet real-world applications often encounter various data-related challenges. One such challenge could arise when curating training data or deploying the model in the real world - two comparable datasets in the same domain may have different distributions. While numerous techniques exist for detecting distribution shifts, the literature lacks comprehensive approaches for explaining dataset differences in a human-understandable manner. To address this gap, we propose a suite of interpretable methods (toolbox) for comparing two datasets. We demonstrate the versatility of our approach across diverse data modalities, including tabular data, language, images, and signals in both low and high-dimensional settings. Our methods not only outperform comparable and related approaches in terms of explanation quality and correctness, but also provide actionable, complementary insights to understand and mitigate dataset differences effectively.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.05652

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Add feedback

Towards Unified Alignment Between Agents, Humans, and Environment

Yang, Zonghan, Liu, An, Liu, Zijun, Liu, Kaiming, Xiong, Fangzhou, Wang, Yile, Yang, Zeyuan, Hu, Qingyuan, Chen, Xinrui, Zhang, Zhenhe, Luo, Fuwen, Guo, Zhicheng, Li, Peng, Liu, Yang

arXiv.org Artificial IntelligenceFeb-12-2024

The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.07744

Country:

Asia > China (0.14)
North America > United States (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Retail (1.00)
Education (0.93)
Leisure & Entertainment > Games > Computer Games (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Learned Kernels for Interpretable and Efficient Medical Time Series Processing

Chen, Sully F., Guo, Zhicheng, Ding, Cheng, Hu, Xiao, Rudin, Cynthia

arXiv.org Artificial IntelligenceDec-23-2023

Background: Signal processing methods are the foundation for clinical interpretation across a wide variety of medical applications. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance but at a cost: deep learning models are often compute-intensive and lack interpretability. Methods: We propose a sparse, interpretable architecture for medical time series processing. The method learns a set of lightweight flexible kernels to construct a single-layer neural network, providing a new efficient, robust, and interpretable approach. We introduce novel parameter reduction techniques to further reduce the size of our network. We demonstrate the power of our architecture on the important task of photoplethysmography artifact detection, where our approach has performance similar to the state-of-the-art deep neural networks with several orders of magnitude fewer parameters, allowing for the integration of deep neural network level performance into extremely low-power wearable devices. Results: Our interpretable method achieves greater than 99\% of the performance of the state-of-the-art methods on the artifact detection task, and even outperforms the state-of-the-art on a challenging out-of-distribution test set, while using dramatically fewer parameters (2\% of the parameters of Segade, and about half of the parameters of Tiny-PPG). Conclusions: Learned kernels are competitive with deep neural networks for medical time series processing with dramatically fewer parameters. Our method is particularly suited for real-time applications and low-power devices, and it maintains interpretability.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.05385

Country: North America > United States (0.68)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

Ding, Cheng, Guo, Zhicheng, Rudin, Cynthia, Xiao, Ran, Nahab, Fadi B, Hu, Xiao

arXiv.org Artificial IntelligenceDec-4-2023

Especially with the utilization of photoplethysmography (PPG) signal, these devices have demonstrated significant potential in providing real-time insights into an individual's health status. PPG, due to its non-invasive nature and ease of integration into wearable technology, has become a cornerstone in modern health monitoring systems [5]. Analyzing wearable device signals often involves ML models of different complexities [6, 7]. In the model development phase, typically, continuous signals are cut into discrete segments, and the model's performance is evaluated at the segment level using conventional metrics such as accuracy, sensitivity, specificity, and F1 score [8]. However, relying solely on these conventional metrics at the segment level does not provide a holistic assessment and hurts both consumers by making it impossible to select optimal solution for their needs and innovators by failing to guide their effort towards true progresses. The complex nature of continuous health monitoring using wearable devices introduces unique challenges beyond conventional evaluation approaches' capabilities, as illustrated in Figure 1. Recognizing these challenges is imperative for imbuing continuous health monitoring applications with accurate and reliable ML models to ensure a successful translation of these models into everyday use by millions of people and fulfill the potential of this technology at scale. In the subsequent sections, we outline the challenges in evaluating ML models for continuous health monitoring using wearables, thoroughly review existing evaluation methods and metrics, and propose a standardized evaluation guideline.

artificial intelligence, machine learning, ml model, (17 more...)

arXiv.org Artificial Intelligence

2312.023

Country: North America > United States (0.96)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Consumer Health (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Can Vision-Language Models Think from a First-Person Perspective?

Cheng, Sijie, Guo, Zhicheng, Wu, Jingwen, Fang, Kechen, Li, Peng, Liu, Huaping, Liu, Yang

arXiv.org Artificial IntelligenceNov-27-2023

Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from egocentric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate eighteen popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

dimension, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.15596

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

SiamAF: Learning Shared Information from ECG and PPG Signals for Robust Atrial Fibrillation Detection

Guo, Zhicheng, Ding, Cheng, Do, Duc H., Shah, Amit, Lee, Randall J., Hu, Xiao, Rudin, Cynthia

arXiv.org Artificial IntelligenceOct-13-2023

Atrial fibrillation (AF) is the most common type of cardiac arrhythmia. It is associated with an increased risk of stroke, heart failure, and other cardiovascular complications, but can be clinically silent. Passive AF monitoring with wearables may help reduce adverse clinical outcomes related to AF. Detecting AF in noisy wearable data poses a significant challenge, leading to the emergence of various deep learning techniques. Previous deep learning models learn from a single modality, either electrocardiogram (ECG) or photoplethysmography (PPG) signals. However, deep learning models often struggle to learn generalizable features and rely on features that are more susceptible to corruption from noise, leading to sub-optimal performances in certain scenarios, especially with low-quality signals. Given the increasing availability of ECG and PPG signal pairs from wearables and bedside monitors, we propose a new approach, SiamAF, leveraging a novel Siamese network architecture and joint learning loss function to learn shared information from both ECG and PPG signals. At inference time, the proposed model is able to predict AF from either PPG or ECG and outperforms baseline methods on three external test sets. It learns medically relevant features as a result of our novel architecture design. The proposed model also achieves comparable performance to traditional learning regimes while requiring much fewer training labels, providing a potential approach to reduce future reliance on manual labeling.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2310.09203

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback