AITopics | Liu, Honghai

Collaborating Authors

Liu, Honghai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation

Ji, Haoyu, Chen, Bowen, Ren, Weihong, Huang, Wenze, Yang, Zhihao, Wang, Zhiyong, Liu, Honghai

arXiv.org Artificial IntelligenceMar-19-2025

--Skeleton-based T emporal Action Segmentation (ST AS) aims to segment and recognize various actions from long, untrimmed sequences of human skeletal movements. Current ST AS methods typically employ spatio-temporal modeling to establish dependencies among joints as well as frames, and utilize one-hot encoding with cross-entropy loss for frame-wise classification supervision. However, these methods overlook the intrinsic correlations among joints and actions within skeletal features, leading to a limited understanding of human movements. T o address this, we propose a T ext-Derived Relational Graph-Enhanced Network (TRG-Net) that leverages prior graphs generated by Large Language Models (LLM) to enhance both modeling and supervision. For modeling, the Dynamic Spatio-T emporal Fusion Modeling (DSFM) method incorporates T ext-Derived Joint Graphs (TJG) with channel-and frame-level dynamic adaptation to effectively model spatial relations, while integrating spatio-temporal core features during temporal modeling. For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class distributions, and utilizes T ext-Derived Action Graphs (T AG) to capture the relative inter-class relationships among action features. Additionally, we propose a Spatial-A ware Enhancement Processing (SAEP) method, which incorporates random joint occlusion and axial rotation to enhance spatial generalization. Performance evaluations on four public datasets demonstrate that TRG-Net achieves state-of-the-art results. EMPORAL Action Segmentation (T AS), an advanced task in video understanding, aims to segment and recognize each action within long, untrimmed video sequences of human activities [1]. Similar to how semantic segmentation predicts labels for each pixel in an image, T AS predicts action labels for each frame in a video. As a significant task in computer vision, T AS finds applications in various domains such as medical rehabilitation, [2], industrial monitoring [3], and activity analysis [4]. Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Y ang, Zhiyong Wang, and Honghai Liu are with the State Key Laboratory of Robotics and Systems, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China (e-mail: jihaoyu1224@gmail.com, The code is available at https://github.com/HaoyuJi/TRG-Net. The text embeddings and relational graphs generated by large language models can serve as priors for enhancing modeling and supervision of action segmentation. Specifically, the text-derived joint graph effectively captures spatial correlations, while the text-derived action graph and action embeddings supervise the relationships and distributions of action classes. Existing T AS methods can be broadly categorized into two types based on input modality: Video-based T AS (VT AS) and Skeleton-based T AS (ST AS) [5]-[7].

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.15126

Country: Asia > China > Guangdong Province > Shenzhen (0.45)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

Zhang, Gongyue, Liu, Honghai

arXiv.org Artificial IntelligenceMay-28-2024

Existing first-order optimizers mainly include two branches: classical optimizers represented by Stochastic Gradient Descent (SGD) and adaptive optimizers represented by Adam, along with their many derivatives. The debate over the merits and demerits of these two types of optimizers has persisted for a decade. In practical experience, it is generally considered that SGD is more suitable for tasks like Computer Vision(CV), while adaptive optimizers are widely used in tasks with sparse gradients, such as Large Language Models(LLM). Although adaptive optimizers always offer better convergence speeds in almost all tasks, they can lead to over-fitting in some cases, resulting in poorer generalization performance compared to SGD in certain tasks. Even in Large Language Models, Adam continues to face challenges, and its original strategy may not always have an advantage due to the introduction of improvements such as gradient clipping. With a wide variety of optimization methods available, it is essential to introduce a unified, interpretable theory. This paper will discuss under the framework of first-order optimizers and, through the intervention of the balance theory, will for the first time propose a unified strategy to integrate all first-order optimization methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.18498

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Asymmetric Momentum: A Rethinking of Gradient Descent

Zhang, Gongyue, Zhang, Dinghuang, Zhao, Shuwen, Liu, Donghan, Toptan, Carrie M., Liu, Honghai

arXiv.org Artificial IntelligenceOct-3-2023

Through theoretical and experimental validation, unlike all existing adaptive methods like Adam which penalize frequently-changing parameters and are only applicable to sparse gradients, we propose the simplest SGD enhanced method, Loss-Controlled Asymmetric Momentum(LCAM). By averaging the loss, we divide training process into different loss phases and using different momentum. It not only can accelerates slow-changing parameters for sparse gradients, similar to adaptive optimizers, but also can choose to accelerates frequently-changing parameters for non-sparse gradients, thus being adaptable to all types of datasets. We reinterpret the machine learning training process through the concepts of weight coupling and weight traction, and experimentally validate that weights have directional specificity, which are correlated with the specificity of the dataset. Thus interestingly, we observe that in non-sparse gradients, frequently-changing parameters should actually be accelerated, which is completely opposite to traditional adaptive perspectives. Compared to traditional SGD with momentum, this algorithm separates the weights without additional computational costs. It is noteworthy that this method relies on the network's ability to extract complex features. We primarily use Wide Residual Networks for our research, employing the classic datasets Cifar10 and Cifar100 to test the ability for feature separation and conclude phenomena that are much more important than just accuracy rates. Finally, compared to classic SGD tuning methods, while using WRN on these two datasets and with nearly half the training epochs, we achieve equal or better test accuracy.

artificial intelligence, machine learning, momentum, (18 more...)

arXiv.org Artificial Intelligence

2309.0213

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback