AITopics | Fu, Xinyi

Collaborating Authors

Fu, Xinyi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation

Zhang, He, Fu, Xinyi, Carroll, John M.

arXiv.org Artificial IntelligenceMar-14-2025

Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper introduces a novel framework that leverages the visual understanding capabilities of large multimodal models (LMMs), particularly GPT, to assist annotation workflows. In our proposed approach, human annotators focus on selecting objects via bounding boxes, while the LMM autonomously generates relevant labels. This human-AI collaborative framework enhances annotation efficiency by reducing the cognitive and time burden on human annotators. By analyzing the system's performance across various types of annotation tasks, we demonstrate its ability to generalize to tasks such as object recognition, scene description, and fine-grained categorization. Our proposed framework highlights the potential of this approach to redefine annotation workflows, offering a scalable and efficient solution for large-scale data labeling in computer vision. Finally, we discuss how integrating LMMs into the annotation pipeline can advance bidirectional human-AI alignment, as well as the challenges of alleviating the "endless annotation" burden in the face of information overload by shifting some of the work to AI.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.11096

Country:

Asia (0.46)
North America > United States > Pennsylvania (0.14)

Genre:

Research Report (0.66)
Workflow (0.56)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Benchmarking Zero-Shot Facial Emotion Annotation with Large Language Models: A Multi-Class and Multi-Frame Approach in DailyLife

Zhang, He, Fu, Xinyi

arXiv.org Artificial IntelligenceFeb-17-2025

This study investigates the feasibility and performance of using large language models (LLMs) to automatically annotate human emotions in everyday scenarios. We conducted experiments on the DailyLife subset of the publicly available FERV39k dataset, employing the GPT-4o-mini model for rapid, zero-shot labeling of key frames extracted from video segments. Under a seven-class emotion taxonomy ("Angry," "Disgust," "Fear," "Happy," "Neutral," "Sad," "Surprise"), the LLM achieved an average precision of approximately 50%. In contrast, when limited to ternary emotion classification (negative/neutral/positive), the average precision increased to approximately 64%. Additionally, we explored a strategy that integrates multiple frames within 1-2 second video clips to enhance labeling performance and reduce costs. The results indicate that this approach can slightly improve annotation accuracy. Overall, our preliminary findings highlight the potential application of zero-shot LLMs in human facial emotion annotation tasks, offering new avenues for reducing labeling costs and broadening the applicability of LLMs in complex multimodal environments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.12454

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VRMN-bD: A Multi-modal Natural Behavior Dataset of Immersive Human Fear Responses in VR Stand-up Interactive Games

Zhang, He, Li, Xinyang, Sun, Yuanxi, Fu, Xinyi, Qiu, Christine, Carroll, John M.

arXiv.org Artificial IntelligenceJan-22-2024

Understanding and recognizing emotions are important and challenging issues in the metaverse era. Understanding, identifying, and predicting fear, which is one of the fundamental human emotions, in virtual reality (VR) environments plays an essential role in immersive game development, scene development, and next-generation virtual human-computer interaction applications. In this article, we used VR horror games as a medium to analyze fear emotions by collecting multi-modal data (posture, audio, and physiological signals) from 23 players. We used an LSTM-based model to predict fear with accuracies of 65.31% and 90.47% under 6-level classification (no fear and five different levels of fear) and 2-level classification (no fear and fear), respectively. We constructed a multi-modal natural behavior dataset of immersive human fear responses (VRMN-bD) and compared it with existing relevant advanced datasets. The results show that our dataset has fewer limitations in terms of collection method, data scale and audience scope. We are unique and advanced in targeting multi-modal datasets of fear and behavior in VR stand-up interactive environments. Moreover, we discussed the implications of this work for communities and applications. The dataset and pre-trained model are available at https://github.com/KindOPSTAR/VRMN-bD.

artificial intelligence, human computer interaction, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2401.12133

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

Privacy-preserving design of graph neural networks with applications to vertical federated learning

Wu, Ruofan, Zhang, Mingyang, Lyu, Lingjuan, Xu, Xiaolong, Hao, Xiuquan, Fu, Xinyi, Liu, Tengfei, Zhang, Tianyi, Wang, Weiqiang

arXiv.org Artificial IntelligenceOct-31-2023

The paradigm of vertical federated learning (VFL), where institutions collaboratively train machine learning models via combining each other's local feature or label information, has achieved great success in applications to financial risk management (FRM). The surging developments of graph representation learning (GRL) have opened up new opportunities for FRM applications under FL via efficiently utilizing the graph-structured data generated from underlying transaction networks. Meanwhile, transaction information is often considered highly sensitive. To prevent data leakage during training, it is critical to develop FL protocols with formal privacy guarantees. In this paper, we present an end-to-end GRL framework in the VFL setting called VESPER, which is built upon a general privatization scheme termed perturbed message passing (PMP) that allows the privatization of many popular graph neural architectures. Based on PMP, we discuss the strengths and weaknesses of specific design choices of concrete graph neural architectures and provide solutions and improvements for both dense and sparse graphs. Extensive empirical evaluations over both public datasets and an industry dataset demonstrate that VESPER is capable of training high-performance GNN models over both sparse and dense graphs under reasonable privacy budgets.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2310.20552

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Differentially Private Learning with Per-Sample Adaptive Clipping

Xia, Tianyu, Shen, Shuheng, Yao, Su, Fu, Xinyi, Xu, Ke, Xu, Xiaolong, Fu, Xing

arXiv.org Artificial IntelligenceMay-2-2023

Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clipping, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clipping to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clipping (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clipping while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.00328

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback