AITopics | Zhu, Xiang

Collaborating Authors

Zhu, Xiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Video Super-Resolution: All You Need is a Video Diffusion Model

Zhan, Zhihao, Pang, Wang, Zhu, Xiang, Bai, Yechao

arXiv.org Artificial IntelligenceMar-16-2025

The concept of super-resolution was first proposed in the 1980s [1, 2], primarily focusing on multi-frame image super-resolution, also known as video super-resolution (VSR). The fundamental principle involves aligning and fusing image information of the same object across multiple frames to surpass the Nyquist limit. This process represents a typical inverse problem, requiring sub-pixel spatial alignment across frames, along with resampling and deconvolution to achieve enhanced resolution. Over the past decade, the primary focus of super-resolution has shifted towards single image super-resolution (SISR), which eliminates the need for spatial alignment or motion estimation. The recovery of high-frequency components in SISR predominantly relies on deep neural networks such as convolutional neural networks (CNNs) [3, 4, 5]. These networks are capable of mapping low-resolution (LR) input image to the corresponding high-resolution (HR) output, mimicking the behavior of deconvolution. Such methods are effective when the upscaling factor is less than 4x; however, beyond this value, the output images tend to appear overly smoothed. Since 2022, diffusion models (DMs) [6, 7] have become increasingly important in SISR.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.03355

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Zhang, Jianke, Guo, Yanjiang, Hu, Yucheng, Chen, Xiaoyu, Zhu, Xiang, Chen, Jianyu

arXiv.org Artificial IntelligenceFeb-2-2025

Recent advancements in Vision-Language-Action (VLA) models have leveraged pre-trained Vision-Language Models (VLMs) to improve the generalization capabilities. VLMs, typically pre-trained on vision-language understanding tasks, provide rich semantic knowledge and reasoning abilities. However, prior research has shown that VLMs often focus on high-level semantic content and neglect low-level features, limiting their ability to capture detailed spatial information and understand physical dynamics. These aspects, which are crucial for embodied control tasks, remain underexplored in existing pre-training paradigms. In this paper, we investigate the training paradigm for VLAs, and introduce \textbf{UP-VLA}, a \textbf{U}nified VLA model training with both multi-modal \textbf{U}nderstanding and future \textbf{P}rediction objectives, enhancing both high-level semantic comprehension and low-level spatial understanding. Experimental results show that UP-VLA achieves a 33% improvement on the Calvin ABC-D benchmark compared to the previous state-of-the-art method. Additionally, UP-VLA demonstrates improved success rates in real-world manipulation tasks, particularly those requiring precise spatial information.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.18867

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
(2 more...)

Add feedback

Image Motion Blur Removal in the Temporal Dimension with Video Diffusion Models

Pang, Wang, Zhan, Zhihao, Zhu, Xiang, Bai, Yechao

arXiv.org Artificial IntelligenceJan-21-2025

Most motion deblurring algorithms rely on spatial-domain convolution models, which struggle with the complex, non-linear blur arising from camera shake and object motion. In contrast, we propose a novel single-image deblurring approach that treats motion blur as a temporal averaging phenomenon. Our core innovation lies in leveraging a pre-trained video diffusion transformer model to capture diverse motion dynamics within a latent space. It sidesteps explicit kernel estimation and effectively accommodates diverse motion patterns. We implement the algorithm within a diffusion-based inverse problem framework. Empirical results on synthetic and real-world datasets demonstrate that our method outperforms existing techniques in deblurring complex motion blur scenarios. This work paves the way for utilizing powerful video diffusion models to address single-image deblurring challenges.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.12604

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Stylized Table Tennis Robots Skill Learning with Incomplete Human Demonstrations

Zhu, Xiang, Chen, Zixuan, Chen, Jianyu

arXiv.org Artificial IntelligenceSep-16-2023

In recent years, Reinforcement Learning (RL) is becoming a popular technique for training controllers for robots. However, for complex dynamic robot control tasks, RL-based method often produces controllers with unrealistic styles. In contrast, humans can learn well-stylized skills under supervisions. For example, people learn table tennis skills by imitating the motions of coaches. Such reference motions are often incomplete, e.g. without the presence of an actual ball. Inspired by this, we propose an RL-based algorithm to train a robot that can learn the playing style from such incomplete human demonstrations. We collect data through the teaching-and-dragging method. We also propose data augmentation techniques to enable our robot to adapt to balls of different velocities. We finally evaluate our policy in different simulators with varying dynamics.

demonstration, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2309.08904

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Tennis (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Zhu, Xiang, Kang, Shucheng, Chen, Jianyu

arXiv.org Artificial IntelligenceJul-27-2022

Reinforcement learning shows great potential to solve complex contact-rich robot manipulation tasks. However, the safety of using RL in the real world is a crucial problem, since unexpected dangerous collisions might happen when the RL policy is imperfect during training or in unseen scenarios. In this paper, we propose a contact-safe reinforcement learning framework for contact-rich robot manipulation, which maintains safety in both the task space and joint space. When the RL policy causes unexpected collisions between the robot arm and the environment, our framework is able to immediately detect the collision and ensure the contact force to be small. Furthermore, the end-effector is enforced to perform contact-rich tasks compliantly, while keeping robust to external disturbances. We train the RL policy in simulation and transfer it to the real robot. Real world experiments on robot wiping tasks show that our method is able to keep the contact force small both in task space and joint space even when the policy is under unseen scenario with unexpected collision, while rejecting the disturbances on the main task.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2207.13438

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.81)

Add feedback