AITopics | Liang, Yu

Collaborating Authors

Liang, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features

An, Shan, Dai, Shipeng, Ansari, Mahrukh, Liang, Yu, Zeng, Ming, Tsintotas, Konstantinos A., Fu, Changhong, Zhang, Hong

arXiv.org Artificial IntelligenceMar-7-2025

Accurate hand pose estimation is vital in robotics, advancing dexterous manipulation in human-computer interaction. Toward this goal, this paper presents ReJSHand (which stands for Refined Joint and Skeleton Features), a cutting-edge network formulated for real-time hand pose estimation and mesh reconstruction. The proposed framework is designed to accurately predict 3D hand gestures under real-time constraints, which is essential for systems that demand agile and responsive hand motion tracking. The network's design prioritizes computational efficiency without compromising accuracy, a prerequisite for instantaneous robotic interactions. Specifically, ReJSHand comprises a 2D keypoint generator, a 3D keypoint generator, an expansion block, and a feature interaction block for meticulously reconstructing 3D hand poses from 2D imagery. In addition, the multi-head self-attention mechanism and a coordinate attention layer enhance feature representation, streamlining the creation of hand mesh vertices through sophisticated feature mapping and linear transformation. Regarding performance, comprehensive evaluations on the FreiHand dataset demonstrate ReJSHand's computational prowess. It achieves a frame rate of 72 frames per second while maintaining a PA-MPJPE (Position-Accurate Mean Per Joint Position Error) of 6.3 mm and a PA-MPVPE (Position-Accurate Mean Per Vertex Position Error) of 6.4 mm. Moreover, our model reaches scores of 0.756 for F@05 and 0.984 for F@15, surpassing modern pipelines and solidifying its position at the forefront of robotic hand pose estimators. To facilitate future studies, we provide our source code at ~\url{https://github.com/daishipeng/ReJSHand}.

artificial intelligence, computer vision, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.05995

Country: Asia > China (0.94)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Spiking Vision Transformer with Saccadic Attention

Wang, Shuai, Zhang, Malu, Zhang, Dehao, Belatreche, Ammar, Xiao, Yichen, Liang, Yu, Shan, Yimeng, Sun, Qian, Zhang, Enqi, Yang, Yang

arXiv.org Artificial IntelligenceFeb-18-2025

The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.12677

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition

Wang, Zaitian, He, Jian, Liang, Yu, Hu, Xiyuan, Peng, Tianhao, Wang, Kaixin, Wang, Jiakai, Zhang, Chenlong, Zhang, Weili, Niu, Shuang, Xie, Xiaoyang

arXiv.org Artificial IntelligenceFeb-1-2025

Emotions play a crucial role in human behavior and decision-making, making emotion recognition a key area of interest in human-computer interaction (HCI). This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals, introducing a novel multimodal framework-Milmer. The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities. It consists of an EEG preprocessing module, a facial feature extraction and balancing module, and a cross-modal fusion module. To enhance visual feature extraction, we fine-tune a pre-trained Swin Transformer on emotion-related datasets. Additionally, a cross-attention mechanism is introduced to balance token representation across modalities, ensuring effective feature integration. A key innovation of this work is the adoption of a multiple instance learning (MIL) approach, which extracts meaningful information from multiple facial expression images over time, capturing critical temporal dynamics often overlooked in previous studies. Extensive experiments conducted on the DEAP dataset demonstrate the superiority of the proposed framework, achieving a classification accuracy of 96.72% in the four-class emotion recognition task. Ablation studies further validate the contributions of each module, highlighting the significance of advanced feature extraction and fusion strategies in enhancing emotion recognition performance. Our code are available at https://github.com/liangyubuaa/Milmer.

artificial intelligence, emotion recognition, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.00547

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy

Peng, Tianhao, Wu, Wenjun, Yuan, Haitao, Bao, Zhifeng, Pengrui, Zhao, Yu, Xin, Lin, Xuetao, Liang, Yu, Pu, Yanjun

arXiv.org Artificial IntelligenceDec-15-2023

Graph neural networks (GNNs) have shown advantages in graph-based analysis tasks. However, most existing methods have the homogeneity assumption and show poor performance on heterophilic graphs, where the linked nodes have dissimilar features and different class labels, and the semantically related nodes might be multi-hop away. To address this limitation, this paper presents GraphRARE, a general framework built upon node relative entropy and deep reinforcement learning, to strengthen the expressive capability of GNNs. An innovative node relative entropy, which considers node features and structural similarity, is used to measure mutual information between node pairs. In addition, to avoid the sub-optimal solutions caused by mixing useful information and noises of remote nodes, a deep reinforcement learning-based algorithm is developed to optimize the graph topology. This algorithm selects informative nodes and discards noisy nodes based on the defined node relative entropy. Extensive experiments are conducted on seven real-world datasets. The experimental results demonstrate the superiority of GraphRARE in node classification and its capability to optimize the original graph topology.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2312.09708

Country:

North America > United States (0.29)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

Peng, Tianhao, Liang, Yu, Wu, Wenjun, Ren, Jian, Pengrui, Zhao, Pu, Yanjun

arXiv.org Artificial IntelligenceJul-30-2023

Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic performance. In order to bridge this gap, we choose a software engineering course as the study subject. The students who participate in a software engineering course are required to team up and complete a software project together. In this work, we construct an interaction graph based on the activities of students grouped in various teams. Based on this student interaction graph, we present an extended graph transformer framework for collaborative learning (CLGT) for evaluating and predicting the performance of students. Moreover, the proposed CLGT contains an interpretation module that explains the prediction results and visualizes the student interaction patterns. The experimental results confirm that the proposed CLGT outperforms the baseline models in terms of performing predictions based on the real-world datasets. Moreover, the proposed CLGT differentiates the students with poor performance in the collaborative learning paradigm and gives teachers early warnings, so that appropriate assistance can be provided.

artificial intelligence, machine learning, student, (16 more...)

arXiv.org Artificial Intelligence

2308.02038

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.93)

Industry:

Education > Curriculum > Subject-Specific Education (0.55)
Education > Assessment & Standards > Student Performance (0.40)

Technology:

Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(2 more...)

Add feedback

Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data

Liang, Yu, Chaudhuri, Arin, Wang, Haoyu

arXiv.org Machine LearningJul-16-2020

Dimension reduction and visualization of high-dimensional data have become very important research topics in many scientific fields because of the rapid growth of data sets with large sample size and/or dimensions. In the literature of dimension reduction and information visualization, linear methods such as principal component analysis (PCA) [7] and classical scaling [17] mainly focus on preserving the most significant structure or maximum variance in data; nonlinear methods such as multidimensional scaling [2], isomap [16], and curvilinear component analysis (CCA) [5] mainly focus on preserving the long or short distances in the high-dimensional space. They generally perform well in preserving the global structure of data but can fail to preserve the local structure. In recent years, the manifold learning methods, such as SNE [6], Laplacian eigenmap [1], LINE [15], LARGEVIS [14], t-SNE [19] [18], and UMAP [10], have gained popularity because of their ability to preserve both the local and some aspects of the global structure of data. These methods generally assume that data lie on a low-dimensional manifold of the high-dimensional input space. They seek to find the manifold that preserves the intrinsic structure of the high-dimensional data. Many of the manifold learning methods suffer from something called the "crowding problem" while preserving local distance of high-dimensional data in low-dimensional space. This means that, if you want to describe small distances in high-dimensional space faithfully, the points with moderate or large distances between them in high-dimensional space are placed too far away from each other in low-dimensional space.

artificial intelligence, machine learning, subcluster, (14 more...)

arXiv.org Machine Learning

2007.08711

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Education (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback