AITopics | Chen, Yanting

Collaborating Authors

Chen, Yanting

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sharingan: Extract User Action Sequence from Desktop Recordings

Chen, Yanting, Ren, Yi, Qin, Xiaoting, Zhang, Jue, Yuan, Kehong, Han, Lu, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

arXiv.org Artificial IntelligenceNov-13-2024

Video recordings of user activities, particularly desktop recordings, offer a rich source of data for understanding user behaviors and automating processes. However, despite advancements in Vision-Language Models (VLMs) and their increasing use in video analysis, extracting user actions from desktop recordings remains an underexplored area. This paper addresses this gap by proposing two novel VLM-based methods for user action extraction: the Direct Frame-Based Approach (DF), which inputs sampled frames directly into VLMs, and the Differential Frame-Based Approach (DiffF), which incorporates explicit frame differences detected via computer vision techniques. We evaluate these methods using a basic self-curated dataset and an advanced benchmark adapted from prior work. Our results show that the DF approach achieves an accuracy of 70% to 80% in identifying user actions, with the extracted action sequences being re-playable though Robotic Process Automation. We find that while VLMs show potential, incorporating explicit UI changes can degrade performance, making the DF approach more reliable. This work represents the first application of VLMs for extracting user action sequences from desktop recordings, contributing new methods, benchmarks, and insights for future research.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2411.08768

Country: Asia > Middle East (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations

Zhang, Hanlei, Wang, Xin, Xu, Hua, Zhou, Qianrui, Gao, Kai, Su, Jianhua, Zhao, jinyue, Li, Wenrui, Chen, Yanting

arXiv.org Artificial IntelligenceJun-27-2024

Multimodal intent recognition poses significant challenges, requiring the incorporation of non-verbal modalities from real-world contexts to enhance the comprehension of human intentions. Existing benchmark datasets are limited in scale and suffer from difficulties in handling out-of-scope samples that arise in multi-turn conversational interactions. We introduce MIntRec2.0, a large-scale benchmark dataset for multimodal intent recognition in multi-party conversations. It contains 1,245 dialogues with 15,040 samples, each annotated within a new intent taxonomy of 30 fine-grained classes. Besides 9,304 in-scope samples, it also includes 5,736 out-of-scope samples appearing in multi-turn contexts, which naturally occur in real-world scenarios. Furthermore, we provide comprehensive information on the speakers in each utterance, enriching its utility for multi-party conversational research. We establish a general framework supporting the organization of single-turn and multi-turn dialogue data, modality feature extraction, multimodal fusion, as well as in-scope classification and out-of-scope detection. Evaluation benchmarks are built using classic multimodal fusion methods, ChatGPT, and human evaluators. While existing methods incorporating nonverbal information yield improvements, effectively leveraging context information and detecting out-of-scope samples remains a substantial challenge. Notably, large language models exhibit a significant performance gap compared to humans, highlighting the limitations of machine learning methods in the cognitive intent understanding task. We believe that MIntRec2.0 will serve as a valuable resource, providing a pioneering foundation for research in human-machine conversational interactions, and significantly facilitating related applications. The full dataset and codes are available at https://github.com/thuiar/MIntRec2.0.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2403.10943

Country: Asia > China > Jiangxi Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.92)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
(2 more...)

Add feedback

The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness

Li, Xuran, Wu, Peng, Chen, Yanting, Ma, Xingjun, Zhang, Zhen, Dong, Kaixiang

arXiv.org Artificial IntelligenceApr-1-2024

Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness requires that predictions for an instance and its similar counterparts consistently align with the ground truth when subjected to input perturbations. We propose an adversarial attack approach dubbed RAFair to expose false or biased adversarial defects in DNN, which either deceive accuracy or compromise individual fairness. Then, we show that such adversarial instances can be effectively addressed by carefully designed benign perturbations, correcting their predictions to be accurate and fair. Our work explores the double-edged sword of input perturbations to robust accurate fairness in DNN and the potential of using benign perturbations to correct adversarial instances.

artificial intelligence, machine learning, perturbation, (17 more...)

arXiv.org Artificial Intelligence

2404.01356

Country:

North America > United States (0.68)
Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

Li, Xuran, Wu, Peng, Dong, Kaixiang, Zhang, Zhen, Chen, Yanting

arXiv.org Artificial IntelligenceOct-8-2023

Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations, including false perturbations that undermine prediction accuracy and biased perturbations that cause biased predictions for similar inputs. This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to these false or biased perturbations. RobustFair employs the notion of the fairness confusion matrix induced in accurate fairness to identify the crucial input features for perturbations. This matrix categorizes predictions as true fair, true biased, false fair, and false biased, and the perturbations guided by it can produce a dual impact on instances and their similar counterparts to either undermine prediction accuracy (robustness) or cause biased predictions (individual fairness). RobustFair then infers the ground truth of these generated adversarial instances based on their loss function values approximated by the total derivative. To leverage the generated instances for trustworthiness improvement, RobustFair further proposes a data augmentation strategy to prioritize adversarial instances resembling the original training set, for data augmentation and model retraining. Notably, RobustFair excels at detecting intertwined issues of robustness and individual fairness, which are frequently overlooked in standard robustness and individual fairness evaluations. This capability empowers RobustFair to enhance both robustness and individual fairness evaluations by concurrently identifying defects in either domain. Empirical case studies and quantile regression analyses on benchmark datasets demonstrate the effectiveness of the fairness confusion matrix guided perturbation for false or biased adversarial instance generation.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2305.10906

Country:

North America > United States > Maryland (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > Promising Solution (0.66)
Overview > Innovation (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback