AITopics | Ren, Pengzhen

Collaborating Authors

Ren, Pengzhen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Ren, Pengzhen, Li, Min, Luo, Zhen, Song, Xinshuai, Chen, Ziwei, Liufu, Weijia, Yang, Yixuan, Zheng, Hao, Xu, Rongtao, Huang, Zitong, Ding, Tongsheng, Xie, Luyang, Zhang, Kaidong, Fu, Changfei, Liu, Yang, Lin, Liang, Zheng, Feng, Liang, Xiaodan

arXiv.org Artificial IntelligenceDec-7-2024

Realizing scaling laws in embodied AI has become a focus. However, previous work has been scattered across diverse simulation platforms, with assets and models lacking unified interfaces, which has led to inefficiencies in research. To address this, we introduce InfiniteWorld, a unified and scalable simulator for general vision-language robot interaction built on Nvidia Isaac Sim. InfiniteWorld encompasses a comprehensive set of physics asset construction methods and generalized free robot interaction benchmarks. Specifically, we first built a unified and scalable simulation framework for embodied learning that integrates a series of improvements in generation-driven 3D asset construction, Real2Sim, automated annotation framework, and unified 3D asset processing. This framework provides a unified and scalable platform for robot interaction and learning. In addition, to simulate realistic robot interaction, we build four new general benchmarks, including scene graph collaborative exploration and open-world social mobile manipulation. The former is often overlooked as an important task for robots to explore the environment and build scene knowledge, while the latter simulates robot interaction tasks with different levels of knowledge agents based on the former. They can more comprehensively evaluate the embodied agent's capabilities in environmental understanding, task planning and execution, and intelligent interaction. We hope that this work can provide the community with a systematic asset interface, alleviate the dilemma of the lack of high-quality assets, and provide a more comprehensive evaluation of robot interactions.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.05789

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

Zhang, Kaidong, Ren, Pengzhen, Lin, Bingqian, Lin, Junfan, Ma, Shikui, Xu, Hang, Liang, Xiaodan

arXiv.org Artificial IntelligenceOct-16-2024

Language-guided robotic manipulation is a challenging task that requires an embodied agent to follow abstract user instructions to accomplish various complex manipulation tasks. Previous work trivially fitting the data without revealing the relation between instruction and low-level executable actions, these models are prone to memorizing the surficial pattern of the data instead of acquiring the transferable knowledge, and thus are fragile to dynamic environment changes. To address this issue, we propose a PrIrmitive-driVen waypOinT-aware world model for Robotic manipulation (PIVOT-R) that focuses solely on the prediction of task-relevant waypoints. Specifically, PIVOT-R consists of a Waypoint-aware World Model (WAWM) and a lightweight action prediction module. The former performs primitive action parsing and primitive-driven waypoint prediction, while the latter focuses on decoding low-level actions. Additionally, we also design an asynchronous hierarchical executor (AHE), which can use different execution frequencies for different modules of the model, thereby helping the model reduce computational redundancy and improve model execution efficiency. Our PIVOT-R outperforms state-of-the-art (SoTA) open-source models on the SeaWave benchmark, achieving an average relative improvement of 19.45% across four levels of instruction tasks. Moreover, compared to the synchronously executed PIVOT-R, the execution efficiency of PIVOT-R with AHE is increased by 28-fold, with only a 2.9% drop in performance. These results provide compelling evidence that our PIVOT-R can significantly improve both the performance and efficiency of robotic manipulation.

artificial intelligence, instruction, pivot-r, (14 more...)

arXiv.org Artificial Intelligence

2410.10394

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.92)

Add feedback

DamWorld: Progressive Reasoning with World Models for Robotic Manipulation

Ren, Pengzhen, Zhang, Kaidong, Zheng, Hetao, Li, Zixuan, Wen, Yuhang, Zhu, Fengda, Ma, Mas, Liang, Xiaodan

arXiv.org Artificial IntelligenceJan-8-2024

The research on embodied AI has greatly promoted the development of robot manipulation. However, it still faces significant challenges in various aspects such as benchmark construction, multi-modal perception and decision-making, and physical execution. Previous robot manipulation simulators were primarily designed to enrich manipulation types and types of objects while neglecting the balance between physical manipulation and language instruction complexity in multi-modal environments. This paper proposes a new robot manipulation simulator and builds a comprehensive and systematic robot manipulation benchmark with progressive reasoning tasks called SeaWave (i.e., a progressive reasoning benchmark). It provides a standard test platform for embedded AI agents in a multi-modal environment, which can evaluate and execute four levels of human natural language instructions at the same time. Previous world model-based robot manipulation work lacked research on the perception and decision-making of complex instructions in multi-modal environments. To this end, we propose a new world model tailored for cross-modal robot manipulation called DamWorld. Specifically, DamWorld takes the current visual scene and predicted execution actions based on natural language instructions as input, and uses the next action frame to supervise the output of the world model to force the model to learn robot manipulation consistent with world knowledge. Compared with the renowned baselines (e.g., RT-1), our DamWorld improves the manipulation success rate by 5.6% on average on four levels of progressive reasoning tasks. It is worth noting that on the most challenging level 4 manipulation task, DamWorld still improved by 9.0% compared to prior works.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.11335

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency

Ren, Pengzhen, Li, Changlin, Xu, Hang, Zhu, Yi, Wang, Guangrun, Liu, Jianzhuang, Chang, Xiaojun, Liang, Xiaodan

arXiv.org Artificial IntelligenceJan-30-2023

Recently, great success has been made in learning visual representations from text supervision, facilitating the emergence of text-supervised semantic segmentation. However, existing works focus on pixel grouping and cross-modal semantic alignment, while ignoring the correspondence among multiple augmented views of the same image. To overcome such limitation, we propose multi-\textbf{View} \textbf{Co}nsistent learning (ViewCo) for text-supervised semantic segmentation. Specifically, we first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image. Additionally, we propose cross-view segmentation consistency modeling to address the ambiguity issue of text supervision by contrasting the segment features of Siamese visual encoders. The text-to-views consistency benefits the dense assignment of the visual features by encouraging different crops to align with the same text, while the cross-view segmentation consistency modeling provides additional self-supervision, overcoming the limitation of ambiguous text supervision for segmentation masks. Trained with large-scale image-text data, our model can directly segment objects of arbitrary categories in a zero-shot manner. Extensive experiments show that ViewCo outperforms state-of-the-art methods on average by up to 2.9\%, 1.6\%, and 2.4\% mIoU on PASCAL VOC2012, PASCAL Context, and COCO, respectively.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.10307

Genre: Research Report (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Survey of Deep Active Learning

Ren, Pengzhen, Xiao, Yun, Chang, Xiaojun, Huang, Po-Yao, Li, Zhihui, Chen, Xiaojiang, Wang, Xin

arXiv.org Machine LearningAug-30-2020

Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL.

deep learning, learning, neural network, (20 more...)

arXiv.org Machine Learning

2009.00236

Country:

Europe (0.67)
North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Education (0.92)

Add feedback

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

Ren, Pengzhen, Xiao, Yun, Chang, Xiaojun, Huang, Po-Yao, Li, Zhihui, Chen, Xiaojiang, Wang, Xin

arXiv.org Machine LearningJun-1-2020

Deep learning has made major breakthroughs and progress in many fields. This is due to the powerful automatic representation capabilities of deep learning. It has been proved that the design of the network architecture is crucial to the feature representation of data and the final performance. In order to obtain a good feature representation of data, the researchers designed various complex network architectures. However, the design of the network architecture relies heavily on the researchers' prior knowledge and experience. Therefore, a natural idea is to reduce human intervention as much as possible and let the algorithm automatically design the architecture of the network. Thus going further to the strong intelligence. In recent years, a large number of related algorithms for \textit{Neural Architecture Search} (NAS) have emerged. They have made various improvements to the NAS algorithm, and the related research work is complicated and rich. In order to reduce the difficulty for beginners to conduct NAS-related research, a comprehensive and systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly from the basic components of NAS: search space, search strategy and evaluation strategy. This classification method is more intuitive, but it is difficult for readers to grasp the challenges and the landmark work in the middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive analysis, comparison and summary of these works. Finally, we give possible future research directions.

deep learning, network architecture, neural network, (16 more...)

arXiv.org Machine Learning

2006.02903

Country: Asia (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback