Yang, Yixuan
RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects
Luo, Zhen, Yang, Yixuan, Cai, Chang, Zhang, Yanfu, Zheng, Feng
As robotic technology rapidly develops, robots are being employed in an increasing number of fields. However, due to the complexity of deployment environments or the prevalence of ambiguous-condition objects, the practical application of robotics still faces many challenges, leading to frequent errors. Traditional methods and some LLM-based approaches, although improved, still require substantial human intervention and struggle with autonomous error correction in complex scenarios.In this work, we propose RoboReflect, a novel framework leveraging large vision-language models (LVLMs) to enable self-reflection and autonomous error correction in robotic grasping tasks. RoboReflect allows robots to automatically adjust their strategies based on unsuccessful attempts until successful execution is achieved.The corrected strategies are saved in a memory for future task reference.We evaluate RoboReflect through extensive testing on eight common objects prone to ambiguous conditions of three categories.Our results demonstrate that RoboReflect not only outperforms existing grasp pose estimation methods like AnyGrasp and high-level action planning techniques using GPT-4V but also significantly enhances the robot's ability to adapt and correct errors independently. These findings underscore the critical importance of autonomous selfreflection in robotic systems while effectively addressing the challenges posed by ambiguous environments.
EquiBoost: An Equivariant Boosting Approach to Molecular Conformation Generation
Yang, Yixuan, Fang, Xingyu, Cheng, Zhaowen, Yan, Pengju, Li, Xiaolin
Molecular conformation generation plays key roles in computational drug design. Recently developed deep learning methods, particularly diffusion models have reached competitive performance over traditional cheminformatical approaches. However, these methods are often time-consuming or require extra support from traditional methods. We propose EquiBoost, a boosting model that stacks several equivariant graph transformers as weak learners, to iteratively refine 3D conformations of molecules. Without relying on diffusion techniques, EquiBoost balances accuracy and efficiency more effectively than diffusion-based methods. Notably, compared to the previous state-of-the-art diffusion method, EquiBoost improves generation quality and preserves diversity, achieving considerably better precision of Average Minimum RMSD (AMR) on the GEOM datasets. This work rejuvenates boosting and sheds light on its potential to be a robust alternative to diffusion models in certain scenarios.
EquiFlow: Equivariant Conditional Flow Matching with Optimal Transport for 3D Molecular Conformation Prediction
Tian, Qingwen, Xu, Yuxin, Yang, Yixuan, Wang, Zhen, Liu, Ziqi, Yan, Pengju, Li, Xiaolin
Molecular 3D conformations play a key role in determining how molecules interact with other molecules or protein surfaces. Recent deep learning advancements have improved conformation prediction, but slow training speeds and difficulties in utilizing high-degree features limit performance. We propose EquiFlow, an equivariant conditional flow matching model with optimal transport. EquiFlow uniquely applies conditional flow matching in molecular 3D conformation prediction, leveraging simulation-free training to address slow training speeds. It uses a modified Equiformer model to encode Cartesian molecular conformations along with their atomic and bond properties into higher-degree embeddings. Additionally, EquiFlow employs an ODE solver, providing faster inference speeds compared to diffusion models with SDEs. Experiments on the QM9 dataset show that EquiFlow predicts small molecule conformations more accurately than current state-of-the-art models.
InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Ren, Pengzhen, Li, Min, Luo, Zhen, Song, Xinshuai, Chen, Ziwei, Liufu, Weijia, Yang, Yixuan, Zheng, Hao, Xu, Rongtao, Huang, Zitong, Ding, Tongsheng, Xie, Luyang, Zhang, Kaidong, Fu, Changfei, Liu, Yang, Lin, Liang, Zheng, Feng, Liang, Xiaodan
Realizing scaling laws in embodied AI has become a focus. However, previous work has been scattered across diverse simulation platforms, with assets and models lacking unified interfaces, which has led to inefficiencies in research. To address this, we introduce InfiniteWorld, a unified and scalable simulator for general vision-language robot interaction built on Nvidia Isaac Sim. InfiniteWorld encompasses a comprehensive set of physics asset construction methods and generalized free robot interaction benchmarks. Specifically, we first built a unified and scalable simulation framework for embodied learning that integrates a series of improvements in generation-driven 3D asset construction, Real2Sim, automated annotation framework, and unified 3D asset processing. This framework provides a unified and scalable platform for robot interaction and learning. In addition, to simulate realistic robot interaction, we build four new general benchmarks, including scene graph collaborative exploration and open-world social mobile manipulation. The former is often overlooked as an important task for robots to explore the environment and build scene knowledge, while the latter simulates robot interaction tasks with different levels of knowledge agents based on the former. They can more comprehensively evaluate the embodied agent's capabilities in environmental understanding, task planning and execution, and intelligent interaction. We hope that this work can provide the community with a systematic asset interface, alleviate the dilemma of the lack of high-quality assets, and provide a more comprehensive evaluation of robot interactions.
SniffySquad: Patchiness-Aware Gas Source Localization with Multi-Robot Collaboration
Cheng, Yuhan, Chen, Xuecheng, Yang, Yixuan, Wang, Haoyang, Xu, Jingao, Hong, Chaopeng, Xu, Susu, Zhang, Xiao-Ping, Liu, Yunhao, Chen, Xinlei
Abstract--Gas source localization is pivotal for the rapid mitigation of gas leakage disasters, where mobile robots emerge as a promising solution. However, existing methods predominantly schedule robots' movements based on reactive stimuli or simplified gas plume models. These approaches typically excel in idealized, simulated environments but fall short in real-world gas environments characterized by their patchy distribution. In this work, we introduce SniffySquad, a multi-robot olfactionbased system designed to address the inherent patchiness in gas source localization. SniffySquad incorporates a patchinessaware active sensing approach that enhances the quality of data collection and estimation. Moreover, it features an innovative collaborative role adaptation strategy to boost the efficiency of source-seeking endeavors. Extensive evaluations demonstrate that our system achieves an increase in the success rate by 20%+ and an improvement in path efficiency by 30%+, outperforming state-of-the-art gas source localization solutions. With the knowledge of source locations, subsequent mitigation operations, such as Rapid and accurate responses to gas leak incidents are shutting off valves or sealing the leaks, can be conducted more essential for safeguarding human and environmental health, logically, efficiently, and safely [5].