AITopics | Yang, Di

Collaborating Authors

Yang, Di

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling

Xu, Ancheng, Yang, Di, Li, Renhao, Zhu, Jingwei, Tan, Minghuan, Yang, Min, Qiu, Wanxin, Ma, Mingchen, Wu, Haihong, Li, Bingyu, Sha, Feng, Li, Chengming, Hu, Xiping, Qu, Qiang, Wong, Derek F., Xu, Ruifeng

arXiv.org Artificial IntelligenceJan-16-2025

Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and agent technology enables automatic CBT diagnosis and treatment. However, current LLM-based CBT systems use agents with a fixed structure, limiting their self-optimization capabilities, or providing hollow, unhelpful suggestions due to redundant response patterns. In this work, we utilize Quora-like and YiXinLi single-round consultation models to build a general agent framework that generates high-quality responses for single-turn psychological consultation scenarios. We use a bilingual dataset to evaluate the quality of single-response consultations generated by each framework. Then, we incorporate dynamic routing and supervisory mechanisms inspired by real psychological counseling to construct a CBT-oriented autonomous multi-agent framework, demonstrating its general applicability. Experimental results indicate that AutoCBT can provide higher-quality automated psychological counseling services.

artificial intelligence, autocbt, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.09426

Country: Asia > China (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Small Language Model as Data Prospector for Large Language Model

Ni, Shiwen, Wu, Haihong, Yang, Di, Qu, Qiang, Alinejad-Rokny, Hamid, Yang, Min

arXiv.org Artificial IntelligenceDec-13-2024

The quality of instruction data directly affects the performance of fine-tuned Large Language Models (LLMs). Previously, \cite{li2023one} proposed \texttt{NUGGETS}, which identifies and selects high-quality quality data from a large dataset by identifying those individual instruction examples that can significantly improve the performance of different tasks after being learnt as one-shot instances. In this work, we propose \texttt{SuperNUGGETS}, an improved variant of \texttt{NUGGETS} optimised for efficiency and performance. Our \texttt{SuperNUGGETS} uses a small language model (SLM) instead of a large language model (LLM) to filter the data for outstanding one-shot instances and refines the predefined set of tests. The experimental results show that the performance of \texttt{SuperNUGGETS} only decreases by 1-2% compared to \texttt{NUGGETS}, but the efficiency can be increased by a factor of 58. Compared to the original \texttt{NUGGETS}, our \texttt{SuperNUGGETS} has a higher utility value due to the significantly lower resource consumption.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.0999

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (0.68)
Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Zhang, Chenhao, Li, Renhao, Tan, Minghuan, Yang, Min, Zhu, Jingwei, Yang, Di, Zhao, Jiahao, Ye, Guancheng, Li, Chengming, Hu, Xiping

arXiv.org Artificial IntelligenceJun-10-2024

Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.16433

Country:

Asia > China (0.15)
North America > Canada (0.14)
North America > United States (0.14)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Zhao, Jiahao, Zhu, Jingwei, Tan, Minghuan, Yang, Min, Yang, Di, Zhang, Chenhao, Ye, Guancheng, Li, Chengming, Hu, Xiping

arXiv.org Artificial IntelligenceMay-18-2024

In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.10212

Country:

Europe (0.93)
Asia (0.69)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.94)
Education > Educational Setting (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

HFNeRF: Learning Human Biomechanic Features with Neural Radiance Fields

Dey, Arnab, Yang, Di, Dantcheva, Antitza, Martinet, Jean

arXiv.org Artificial IntelligenceApr-9-2024

In recent advancements in novel view synthesis, generalizable Neural Radiance Fields (NeRF) based methods applied to human subjects have shown remarkable results in generating novel views from few images. However, this generalization ability cannot capture the underlying structural features of the skeleton shared across all instances. Building upon this, we introduce HFNeRF: a novel generalizable human feature NeRF aimed at generating human biomechanic features using a pre-trained image encoder. While previous human NeRF methods have shown promising results in the generation of photorealistic virtual avatars, such methods lack underlying human structure or biomechanic features such as skeleton or joint information that are crucial for downstream applications including Augmented Reality (AR)/Virtual Reality (VR). HFNeRF leverages 2D pre-trained foundation models toward learning human features in 3D using neural rendering, and then volume rendering towards generating 2D feature maps. We evaluate HFNeRF in the skeleton estimation task by predicting heatmaps as features. The proposed method is fully differentiable, allowing to successfully learn color, geometry, and human skeleton in a simultaneous manner. This paper presents preliminary results of HFNeRF, illustrating its potential in generating realistic virtual avatars with biomechanic features using NeRF.

artificial intelligence, hfnerf, human computer interaction, (11 more...)

arXiv.org Artificial Intelligence

2404.06152

Country: Asia (0.15)

Genre: Research Report (0.65)

Industry: Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

GHNeRF: Learning Generalizable Human Features with Efficient Neural Radiance Fields

Dey, Arnab, Yang, Di, Agaram, Rohith, Dantcheva, Antitza, Comport, Andrew I., Sridhar, Srinath, Martinet, Jean

arXiv.org Artificial IntelligenceApr-9-2024

Recent advances in Neural Radiance Fields (NeRF) have demonstrated promising results in 3D scene representations, including 3D human representations. However, these representations often lack crucial information on the underlying human pose and structure, which is crucial for AR/VR applications and games. In this paper, we introduce a novel approach, termed GHNeRF, designed to address these limitations by learning 2D/3D joint locations of human subjects with NeRF representation. GHNeRF uses a pre-trained 2D encoder streamlined to extract essential human features from 2D images, which are then incorporated into the NeRF framework in order to encode human biomechanic features. This allows our network to simultaneously learn biomechanic features, such as joint locations, along with human geometry and texture. To assess the effectiveness of our method, we conduct a comprehensive comparison with state-of-the-art human NeRF techniques and joint estimation algorithms. Our results show that GHNeRF can achieve state-of-the-art results in near real-time.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.06246

Country: Asia (0.46)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Multiscale Modelling with Physics-informed Neural Network: from Large-scale Dynamics to Small-scale Predictions in Complex Systems

Wang, Jing, Li, Zheng, Lai, Pengyu, Wang, Rui, Yang, Di, Yang, Dewu, Xu, Hui

arXiv.org Artificial IntelligenceFeb-8-2024

Multiscale phenomena manifest across various scientific domains, presenting a ubiquitous challenge in accurately and effectively predicting multiscale dynamics in complex systems. In this paper, a novel decoupling solving mode is proposed through modelling large-scale dynamics independently and treating small-scale dynamics as a slaved system. A Spectral Physics-informed Neural Network (PINN) is developed to characterize the small-scale system in an efficient and accurate way. The effectiveness of the method is demonstrated through extensive numerical experiments, including one-dimensional Kuramot-Sivashinsky equation, two- and three-dimensional Navier-Stokes equations, showcasing its versatility in addressing problems of fluid dynamics. Furthermore, we also delve into the application of the proposed approach to more complex problems, including non-uniform meshes, complex geometries, large-scale data with noise, and high-dimensional small-scale dynamics. The discussions about these scenarios contribute to a comprehensive understanding of the method's capabilities and limitations. This paper presents a valuable and promising approach to enhance the computational simulations of multiscale spatiotemporal systems, which enables the acquisition of large-scale data with minimal computational demands, followed by Spectral PINN to capture small-scale dynamics with improved efficiency and accuracy.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.05067

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self-Supervised Video Representation Learning via Latent Time Navigation

Yang, Di, Wang, Yaohui, Kong, Quan, Dantcheva, Antitza, Garattoni, Lorenzo, Francesca, Gianpiero, Bremond, Francois

arXiv.org Artificial IntelligenceMay-10-2023

Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time. This leads to loss of pertinent information related to temporal relationships, rendering actions such as `enter' and `leave' to be indistinguishable. To mitigate this limitation, we propose Latent Time Navigation (LTN), a time-parameterized contrastive learning strategy that is streamlined to capture fine-grained motions. Specifically, we maximize the representation similarity between different video segments from one video, while maintaining their representations time-aware along a subspace of the latent representation code including an orthogonal basis to represent temporal changes. Our extensive experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification in fine-grained and human-oriented tasks (e.g., on Toyota Smarthome dataset). In addition, we demonstrate that our proposed model, when pre-trained on Kinetics-400, generalizes well onto the unseen real world video benchmark datasets UCF101 and HMDB51, achieving state-of-the-art performance in action recognition.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2305.06437

Country: Europe > France (0.46)

Genre: Research Report (1.00)

Industry: Automobiles & Trucks > Manufacturer (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Prompting Neural Machine Translation with Translation Memories

Reheman, Abudurexiti, Zhou, Tao, Luo, Yingfeng, Yang, Di, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceFeb-7-2023

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community. However, previous approaches require either a significant update of the model architecture and/or additional training efforts to make the models well-behaved when TMs are taken as additional input. In this paper, we present a simple but effective method to introduce TMs into neural machine translation (NMT) systems. Specifically, we treat TMs as prompts to the NMT model at test time, but leave the training process unchanged. The result is a slight update of an existing NMT system, which can be implemented in a few hours by anyone who is familiar with NMT. Experimental results on several datasets demonstrate that our system significantly outperforms strong baselines.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2301.0538

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.93)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

Das, Srijan, Dai, Rui, Yang, Di, Bremond, Francois

arXiv.org Artificial IntelligenceMay-17-2021

Abstract--Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN . We show that VPN is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN, with or without 3D Poses, outperforms the representative baselines on 4 public datasets.

deep learning, distillation, neural network, (18 more...)

arXiv.org Artificial Intelligence

2105.08141

Country: North America > United States > Colorado (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback