AITopics | Hu, Xiping

Collaborating Authors

Hu, Xiping

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems

Gong, Linxiao, Yang, Hao, Fang, Gaoyun, Ju, Bobo, Guo, Juncen, Zhu, Xiaoguang, Wang, Yan, Hu, Xiping, Sun, Peng, Boukerche, Azzedine

arXiv.org Artificial IntelligenceFeb-12-2025

The explosive growth of video data has driven the development of distributed video analytics in cloud-edge-terminal collaborative (CETC) systems, enabling efficient video processing, real-time inference, and privacy-preserving analysis. Among multiple advantages, CETC systems can distribute video processing tasks and enable adaptive analytics across cloud, edge, and terminal devices, leading to breakthroughs in video surveillance, autonomous driving, and smart cities. In this survey, we first analyze fundamental architectural components, including hierarchical, distributed, and hybrid frameworks, alongside edge computing platforms and resource management mechanisms. Building upon these foundations, edge-centric approaches emphasize on-device processing, edge-assisted offloading, and edge intelligence, while cloud-centric methods leverage powerful computational capabilities for complex video understanding and model training. Our investigation also covers hybrid video analytics incorporating adaptive task offloading and resource-aware scheduling techniques that optimize performance across the entire system. Beyond conventional approaches, recent advances in large language models and multimodal integration reveal both opportunities and challenges in platform scalability, data protection, and system reliability. Future directions also encompass explainable systems, efficient processing mechanisms, and advanced video analytics, offering valuable insights for researchers and practitioners in this dynamic field.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.06581

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling

Xu, Ancheng, Yang, Di, Li, Renhao, Zhu, Jingwei, Tan, Minghuan, Yang, Min, Qiu, Wanxin, Ma, Mingchen, Wu, Haihong, Li, Bingyu, Sha, Feng, Li, Chengming, Hu, Xiping, Qu, Qiang, Wong, Derek F., Xu, Ruifeng

arXiv.org Artificial IntelligenceJan-16-2025

Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and agent technology enables automatic CBT diagnosis and treatment. However, current LLM-based CBT systems use agents with a fixed structure, limiting their self-optimization capabilities, or providing hollow, unhelpful suggestions due to redundant response patterns. In this work, we utilize Quora-like and YiXinLi single-round consultation models to build a general agent framework that generates high-quality responses for single-turn psychological consultation scenarios. We use a bilingual dataset to evaluate the quality of single-response consultations generated by each framework. Then, we incorporate dynamic routing and supervisory mechanisms inspired by real psychological counseling to construct a CBT-oriented autonomous multi-agent framework, demonstrating its general applicability. Experimental results indicate that AutoCBT can provide higher-quality automated psychological counseling services.

artificial intelligence, autocbt, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.09426

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Facial Expression Analysis and Its Potentials in IoT Systems: A Contemporary Survey

Shanggua, Zixuan, Dong, Yanjie, Guo, Song, Leung, Victor C. M., Deen, M. Jamal, Hu, Xiping

arXiv.org Artificial IntelligenceDec-23-2024

Facial expressions convey human emotions and can be categorized into macro-expressions (MaEs) and micro-expressions (MiEs) based on duration and intensity. While MaEs are voluntary and easily recognized, MiEs are involuntary, rapid, and can reveal concealed emotions. The integration of facial expression analysis with Internet-of-Thing (IoT) systems has significant potential across diverse scenarios. IoT-enhanced MaE analysis enables real-time monitoring of patient emotions, facilitating improved mental health care in smart healthcare. Similarly, IoT-based MiE detection enhances surveillance accuracy and threat detection in smart security. This work aims at providing a comprehensive overview of research progress in facial expression analysis and explores its integration with IoT systems. We discuss the distinctions between our work and existing surveys, elaborate on advancements in MaE and MiE techniques across various learning paradigms, and examine their potential applications in IoT. We highlight challenges and future directions for the convergence of facial expression-based technologies and IoT systems, aiming to foster innovation in this domain. By presenting recent developments and practical applications, this study offers a systematic understanding of how facial expression analysis can enhance IoT systems in healthcare, security, and beyond.

artificial intelligence, machine learning, recognition, (16 more...)

arXiv.org Artificial Intelligence

2412.17616

Country:

Asia (0.93)
Europe (0.68)
North America > Canada > Ontario (0.46)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.92)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

Deng, Qi, Niu, Shuaicheng, Zhang, Ronghao, Chen, Yaofo, Zeng, Runhao, Chen, Jian, Hu, Xiping

arXiv.org Artificial IntelligenceDec-22-2024

Test-time adaptation (TTA) aims to fine-tune a trained model online using unlabeled testing data to adapt to new environments or out-of-distribution data, demonstrating broad application potential in real-world scenarios. However, in this optimization process, unsupervised learning objectives like entropy minimization frequently encounter noisy learning signals. These signals produce unreliable gradients, which hinder the model ability to converge to an optimal solution quickly and introduce significant instability into the optimization process. In this paper, we seek to resolve these issues from the perspective of optimizer design. Unlike prior TTA using manually designed optimizers like SGD, we employ a learning-to-optimize approach to automatically learn an optimizer, called Meta Gradient Generator (MGG). Specifically, we aim for MGG to effectively utilize historical gradient information during the online optimization process to optimize the current model. To this end, in MGG, we design a lightweight and efficient sequence modeling layer -- gradient memory layer. It exploits a self-supervised reconstruction loss to compress historical gradient information into network parameters, thereby enabling better memorization ability over a long-term adaptation process. We only need a small number of unlabeled samples to pre-train MGG, and then the trained MGG can be deployed to process unseen samples. Promising results on ImageNet-C, R, Sketch, and A indicate that our method surpasses current state-of-the-art methods with fewer updates, less data, and significantly shorter adaptation iterations. Compared with a previous SOTA method SAR, we achieve 7.4% accuracy improvement and 4.2 times faster adaptation speed on ImageNet-C.

artificial intelligence, gradient, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.16901

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning

Zeng, Runhao, Zhou, Dingjie, Liang, Qiwei, Liu, Junlin, Li, Hui, Huang, Changxin, Li, Jianqiang, Hu, Xiping, Sun, Fuchun

arXiv.org Artificial IntelligenceDec-6-2024

Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA241014

2412.05515

Country: Asia (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification

Hu, Yuxuan, Zhang, Chenwei, Yang, Min, Liang, Xiaodan, Li, Chengming, Hu, Xiping

arXiv.org Artificial IntelligenceSep-20-2024

With the rapid development of deep learning methods, there have been many breakthroughs in the field of text classification. Models developed for this task have been shown to achieve high accuracy. However, most of these models are trained using labeled data from seen domains. It is difficult for these models to maintain high accuracy in a new challenging unseen domain, which is directly related to the generalization of the model. In this paper, we study the multi-source Domain Generalization of text classification and propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain. Specifically, we propose a multi-source meta-learning Domain Generalization framework to simulate the process of model generalization to an unseen domain, so as to extract sufficient domain-related features. We introduced a memory mechanism to store domain-specific features, which coordinate with the meta-learning framework. Besides, we adopt the novel "jury" mechanism that enables the model to learn sufficient domain-invariant features. Experiments demonstrate that our meta-learning framework can effectively enhance the ability of the model to generalize to an unseen domain and can outperform the state-of-the-art methods on multi-source text classification datasets.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.13787

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

Liang, Feng, Zhang, Zhen, Lu, Haifeng, Li, Chengming, Leung, Victor C. M., Guo, Yanyi, Hu, Xiping

arXiv.org Artificial IntelligenceJun-12-2024

With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for resource allocation and workload scheduling in distributed deep learning, such as scheduling complexity, resource and workload heterogeneity, and fault tolerance. To uncover these challenges and corresponding solutions, this survey reviews the literature, mainly from 2019 to 2024, on efficient resource allocation and workload scheduling strategies for large-scale distributed DL. We explore these strategies by focusing on various resource types, scheduling granularity levels, and performance goals during distributed training and inference processes. We highlight critical challenges for each topic and discuss key insights of existing technologies. To illustrate practical large-scale resource allocation and workload scheduling in real distributed deep learning scenarios, we use a case study of training large language models. This survey aims to encourage computer science, artificial intelligence, and communications researchers to understand recent advances and explore future research directions for efficient framework strategies for large-scale distributed deep learning.

artificial intelligence, machine learning, scheduling, (13 more...)

arXiv.org Artificial Intelligence

2406.08115

Country:

Asia > China (0.69)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Overview (1.00)

Industry:

Information Technology > Services (1.00)
Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Liu, Ziqiang, Fang, Feiteng, Feng, Xi, Du, Xinrun, Zhang, Chenhao, Wang, Zekun, Bai, Yuelin, Zhao, Qixuan, Fan, Liyang, Gan, Chengguang, Lin, Hongquan, Li, Jiaming, Ni, Yuansheng, Wu, Haihong, Narsupalli, Yaswanth, Zheng, Zhigang, Li, Chengming, Hu, Xiping, Xu, Ruifeng, Chen, Xiaojun, Yang, Min, Liu, Jiaheng, Liu, Ruibo, Huang, Wenhao, Zhang, Ge, Ni, Shiwen

arXiv.org Artificial IntelligenceJun-11-2024

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.05862

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Zhang, Chenhao, Li, Renhao, Tan, Minghuan, Yang, Min, Zhu, Jingwei, Yang, Di, Zhao, Jiahao, Ye, Guancheng, Li, Chengming, Hu, Xiping

arXiv.org Artificial IntelligenceJun-10-2024

Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.16433

Country:

Asia > China (0.15)
North America > Canada (0.14)
North America > United States (0.14)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Zhao, Jiahao, Zhu, Jingwei, Tan, Minghuan, Yang, Min, Yang, Di, Zhang, Chenhao, Ye, Guancheng, Li, Chengming, Hu, Xiping

arXiv.org Artificial IntelligenceMay-18-2024

In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.10212

Country:

Europe (0.93)
Asia (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.94)
Education > Educational Setting (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback