AITopics | Yang, Wenfei

Collaborating Authors

Yang, Wenfei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection

Wang, Chuxin, Yang, Wenfei, Liu, Xiang, Zhang, Tianzhu

arXiv.org Artificial IntelligenceMar-19-2025

DETR-based methods, which use multi-layer transformer decoders to refine object queries iteratively, have shown promising performance in 3D indoor object detection. However, the scene point features in the transformer decoder remain fixed, leading to minimal contributions from later decoder layers, thereby limiting performance improvement. Recently, State Space Models (SSM) have shown efficient context modeling ability with linear complexity through iterative interactions between system states and inputs. Inspired by SSMs, we propose a new 3D object DEtection paradigm with an interactive STate space model (DEST). In the interactive SSM, we design a novel state-dependent SSM parameterization method that enables system states to effectively serve as queries in 3D indoor detection tasks. In addition, we introduce four key designs tailored to the characteristics of point cloud and SSM: The serialization and bidirectional scanning strategies enable bidirectional feature interaction among scene points within the SSM. The inter-state attention mechanism models the relationships between state points, while the gated feed-forward network enhances inter-channel correlations. To the best of our knowledge, this is the first method to model queries as system states and scene points as system inputs, which can simultaneously update scene point features and query features with linear complexity. Extensive experiments on two challenging datasets demonstrate the effectiveness of our DEST-based method. Our method improves the GroupFree baseline in terms of AP50 on ScanNet V2 (+5.3) and SUN RGB-D (+3.2) datasets. Based on the VDETR baseline, Our method sets a new SOTA on the ScanNetV2 and SUN RGB-D datasets.

artificial intelligence, machine learning, scene point, (18 more...)

arXiv.org Artificial Intelligence

2503.14493

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.40)

Add feedback

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

Zhu, Ruijie, Liang, Yanzhe, Chang, Hanzhi, Deng, Jiacheng, Lu, Jiahao, Yang, Wenfei, Zhang, Tianzhu, Zhang, Yongdong

arXiv.org Artificial IntelligenceOct-10-2024

Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results.

artificial intelligence, gaussian, optical flow, (12 more...)

arXiv.org Artificial Intelligence

2410.07707

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Media > Television (0.34)
Media > Photography (0.34)
Media > Film (0.34)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy

Zhu, Yu, Sun, Chuxiong, Yang, Wenfei, Wei, Wenqiang, Tang, Bo, Zhang, Tianzhu, Li, Zhiyu, Zhang, Shifeng, Xiong, Feiyu, Hu, Jie, yang, Mingchuan

arXiv.org Artificial IntelligenceMar-7-2024

Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment processes of LLMs, achieving alignment with human values at a much lower computational cost. We start with a novel Markov Decision Process (MDP) designed for the alignment process and employ Reinforcement Learning (RL) to train a streamlined proxy model that oversees the token generation of the LLM, without altering the LLM itself. Experiments show that our method achieves a comparable level of alignment with only 1\% of the training parameters of other methods.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.04283

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback