AITopics | Wang, Yuntao

Collaborating Authors

Wang, Yuntao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

Ding, Jiexin, Zhao, Bowen, Wang, Yuntao, Liu, Xinyun, Hao, Rui, Chatterjee, Ishan, Shi, Yuanchun

arXiv.org Artificial IntelligenceFeb-14-2025

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

information retrieval, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.10378

Country:

North America > United States (1.00)
Europe (0.92)
Asia > Japan > Honshū (0.47)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry: Education > Focused Education > Reading & Literacy > English As A Second Language (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

WatchGuardian: Enabling User-Defined Personalized Just-in-Time Intervention on Smartwatch

Lei, Ying, Cao, Yancheng, Wang, Will, Dong, Yuanzhe, Yin, Changchang, Cao, Weidan, Zhang, Ping, Yang, Jingzhen, Yao, Bingsheng, Peng, Yifan, Weng, Chunhua, Auerbach, Randy, Mamykina, Lena, Wang, Dakuo, Wang, Yuntao, Xu, Xuhai

arXiv.org Artificial IntelligenceFeb-9-2025

While just-in-time interventions (JITIs) have effectively targeted common health behaviors, individuals often have unique needs to intervene in personal undesirable actions that can negatively affect physical, mental, and social well-being. We present WatchGuardian, a smartwatch-based JITI system that empowers users to define custom interventions for these personal actions with a small number of samples. For the model to detect new actions based on limited new data samples, we developed a few-shot learning pipeline that finetuned a pre-trained inertial measurement unit (IMU) model on public hand-gesture datasets. We then designed a data augmentation and synthesis process to train additional classification layers for customization. Our offline evaluation with 26 participants showed that with three, five, and ten examples, our approach achieved an average accuracy of 76.8%, 84.7%, and 87.7%, and an F1 score of 74.8%, 84.2%, and 87.2% We then conducted a four-hour intervention study to compare WatchGuardian against a rule-based intervention. Our results demonstrated that our system led to a significant reduction by 64.0 +- 22.6% in undesirable actions, substantially outperforming the baseline by 29.0%. Our findings underscore the effectiveness of a customizable, AI-driven JITI system for individuals in need of behavioral intervention in personal undesirable actions. We envision that our work can inspire broader applications of user-defined personalized intervention with advanced AI solutions.

artificial intelligence, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2502.05783

Country:

North America > United States (1.00)
Asia (0.67)
Europe > United Kingdom > Scotland (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(4 more...)

Add feedback

Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends

Wang, Yuntao, Pan, Yanghe, Su, Zhou, Deng, Yi, Zhao, Quan, Du, Linkang, Luan, Tom H., Kang, Jiawen, Niyato, Dusit

arXiv.org Artificial IntelligenceJan-8-2025

With the rapid advancement of large models (LMs), the development of general-purpose intelligent agents powered by LMs has become a reality. It is foreseeable that in the near future, LM-driven general AI agents will serve as essential tools in production tasks, capable of autonomous communication and collaboration without human intervention. This paper investigates scenarios involving the autonomous collaboration of future LM agents. We review the current state of LM agents, the key technologies enabling LM agent collaboration, and the security and privacy challenges they face during cooperative operations. To this end, we first explore the foundational principles of LM agents, including their general architecture, key components, enabling technologies, and modern applications. We then discuss practical collaboration paradigms from data, computation, and knowledge perspectives to achieve connected intelligence among LM agents. After that, we analyze the security vulnerabilities and privacy risks associated with LM agents, particularly in multi-agent settings, examining underlying mechanisms and reviewing current and potential countermeasures. Lastly, we propose future research directions for building robust and secure LM agent ecosystems.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.14457

Country: Asia > China (0.27)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.46)
Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
(2 more...)

Add feedback

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Wang, Zeyu, Shi, Yuanchun, Wang, Yuntao, Yao, Yuchen, Yan, Kun, Wang, Yuhan, Ji, Lei, Xu, Xuhai, Yu, Chun

arXiv.org Artificial IntelligenceMay-13-2024

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.

gaze-facilitated information querying, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2405.07652

Genre:

Questionnaire & Opinion Survey (0.73)
Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones

Tang, Jiankai, Li, Xinyi, Liu, Jiacheng, Zhang, Xiyuxing, Wang, Zeyu, Wang, Yuntao

arXiv.org Artificial IntelligenceApr-7-2024

Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.05003

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Time2Stop: Adaptive and Explainable Human-AI Loop for Smartphone Overuse Intervention

Orzikulova, Adiba, Xiao, Han, Li, Zhipeng, Yan, Yukang, Wang, Yuntao, Shi, Yuanchun, Ghassemi, Marzyeh, Lee, Sung-Ju, Dey, Anind K, Xu, Xuhai "Orson"

arXiv.org Artificial IntelligenceMar-3-2024

Despite a rich history of investigating smartphone overuse intervention techniques, AI-based just-in-time adaptive intervention (JITAI) methods for overuse reduction are lacking. We develop Time2Stop, an intelligent, adaptive, and explainable JITAI system that leverages machine learning to identify optimal intervention timings, introduces interventions with transparent AI explanations, and collects user feedback to establish a human-AI loop and adapt the intervention model over time. We conducted an 8-week field experiment (N=71) to evaluate the effectiveness of both the adaptation and explanation aspects of Time2Stop. Our results indicate that our adaptive models significantly outperform the baseline methods on intervention accuracy (>32.8\% relatively) and receptivity (>8.0\%). In addition, incorporating explanations further enhances the effectiveness by 53.8\% and 11.4\% on accuracy and receptivity, respectively. Moreover, Time2Stop significantly reduces overuse, decreasing app visit frequency by 7.0$\sim$8.9\%. Our subjective data also echoed these quantitative measures. Participants preferred the adaptive interventions and rated the system highly on intervention time accuracy, effectiveness, and level of trust. We envision our work can inspire future research on JITAI systems with a human-AI loop to evolve with users.

intervention, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3613904.3642747

2403.05584

Country:

Europe (1.00)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Japan > Honshū > Kantō (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Mobile (1.00)
(3 more...)

Add feedback

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

Yan, Kun, Ji, Lei, Wang, Zeyu, Wang, Yuntao, Duan, Nan, Ma, Shuai

arXiv.org Artificial IntelligenceDec-22-2023

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.09454

Genre: Research Report > Promising Solution (0.87)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Zhao, Linxi, Tang, Jiankai, Chen, Dongyu, Liu, Xiaohong, Zhou, Yong, Wang, Guangyu, Wang, Yuntao

arXiv.org Artificial IntelligenceDec-10-2023

The introduction of machine learning marks a pivotal shift, presenting Nailfold capillaroscopy is a well-established method for automated medical image analysis as a promising alternative assessing health conditions, but the untapped potential of automated due to its higher accuracy compared to traditional image medical image analysis using machine learning remains processing algorithms[5]. Recent studies have attempted to despite recent advancements. In this groundbreaking use single deep-learning models for tasks such as nailfold study, we present a pioneering effort in constructing a comprehensive capillary segmentation[4, 8], measurement of capillary size dataset--321 images, 219 videos, 68 clinic reports, and density[5], and white cell counting[9]. Despite notable with expert annotations--that serves as a crucial resource achievements, the untapped potential of automated medical for training deep-learning models. Leveraging this image analysis persists due to the urgent need for annotated dataset, we propose an end-to-end nailfold capillary analysis and extensive datasets essential for effective training and pipeline capable of automatically detecting and measuring diverse fine-tuning deep neural networks.

artificial intelligence, capillaroscopy, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2312.0593

Country: Asia > China (0.31)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.71)
Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

Tang, Jiankai, Wang, Kegang, Hu, Hongming, Zhang, Xiyuxing, Wang, Peiyu, Liu, Xin, Wang, Yuntao

arXiv.org Artificial IntelligenceNov-21-2023

This study concentrates on evaluating the efficacy of Large Language Models (LLMs) in healthcare, with a specific focus on their application in personal anomalous health monitoring. Our research primarily investigates the capabilities of LLMs in interpreting and analyzing physiological data obtained from FDA-approved devices. We conducted an extensive analysis using anomalous physiological data gathered in a simulated low-air-pressure plateau environment. This allowed us to assess the precision and reliability of LLMs in understanding and evaluating users' health status with notable specificity. Our findings reveal that LLMs exhibit exceptional performance in determining medical indicators, including a Mean Absolute Error (MAE) of less than 1 beat per minute for heart rate and less than 1% for oxygen saturation (SpO2). Furthermore, the Mean Absolute Percentage Error (MAPE) for these evaluations remained below 1%, with the overall accuracy of health assessments surpassing 85%. In image analysis tasks, such as interpreting photoplethysmography (PPG) data, our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count and 7.28 MAE for heart rate estimation. This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants, offering personalized health insights and recommendations within the future health assistant framework.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2311.12524

Country: North America > United States (0.88)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition

Gao, Ziqi, Wang, Yuntao, Chen, Jianguo, Xing, Junliang, Patel, Shwetak, Liu, Xin, Shi, Yuanchun

arXiv.org Artificial IntelligenceOct-11-2023

Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our method achieves superior performance improvements 11.13% of cross-subject F1-score on the MMAct dataset than the previous state-of-the-art (SOTA) methods. The ablation study and analysis suggest that MMTSA's effectiveness in fusing multimodal data for accurate HAR. The efficiency evaluation on an edge device showed that MMTSA achieved significantly better accuracy, lower computational load, and lower inference latency than SOTA methods.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

2210.09222

Country: North America > United States > Virginia (0.28)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback