AITopics | Li, Puhao

Collaborating Authors

Li, Puhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Li, Puhao, Liu, Tengyu, Li, Yuyang, Han, Muzhi, Geng, Haoran, Wang, Shu, Zhu, Yixin, Zhu, Song-Chun, Huang, Siyuan

arXiv.org Artificial IntelligenceApr-26-2024

Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task representations. We introduce Ag2Manip (Agent-Agnostic representations for Manipulation), a framework aimed at surmounting these challenges through two key innovations: a novel agent-agnostic visual representation derived from human manipulation videos, with the specifics of embodiments obscured to enhance generalizability; and an agent-agnostic action representation abstracting a robot's kinematics to a universal agent proxy, emphasizing crucial interactions between end-effector and object. Ag2Manip's empirical validation across simulated benchmarks like FrankaKitchen, ManiSkill, and PartManip shows a 325% increase in performance, achieved without domain-specific demonstrations. Ablation studies underline the essential contributions of the visual and action representations to this success. Extending our evaluations to the real world, Ag2Manip significantly improves imitation learning success rates from 50% to 77.5%, demonstrating its effectiveness and generalizability across both simulated and physical environments.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2404.17521

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.47)

Add feedback

An Embodied Generalist Agent in 3D World

Huang, Jiangyong, Yong, Silong, Ma, Xiaojian, Linghu, Xiongkun, Li, Puhao, Wang, Yan, Li, Qing, Zhu, Song-Chun, Jia, Baoxiong, Huang, Siyuan

arXiv.org Artificial IntelligenceNov-17-2023

Leveraging massive knowledge and learning schemes from large language models (LLMs), recent machine learning models show notable successes in building generalist agents that exhibit the capability of general-purpose task solving in diverse domains, including natural language processing, computer vision, and robotics. However, a significant challenge remains as these models exhibit limited ability in understanding and interacting with the 3D world. We argue this limitation significantly hinders the current models from performing real-world tasks and further achieving general intelligence. To this end, we introduce an embodied multi-modal and multi-task generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in the 3D world. Our proposed agent, referred to as LEO, is trained with shared LLM-based model architectures, objectives, and weights in two stages: (i) 3D vision-language alignment and (ii) 3D vision-language-action instruction tuning. To facilitate the training, we meticulously curate and generate an extensive dataset comprising object-level and scene-level multi-modal tasks with exceeding scale and complexity, necessitating a deep understanding of and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's remarkable proficiency across a wide spectrum of tasks, including 3D captioning, question answering, embodied reasoning, embodied navigation, and robotic manipulation. Our ablation results further provide valuable insights for the development of future embodied generalist agents.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.12871

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Grasp Multiple Objects with One Hand

Li, Yuyang, Liu, Bo, Geng, Yiran, Li, Puhao, Yang, Yaodong, Zhu, Yixin, Liu, Tengyu, Huang, Siyuan

arXiv.org Artificial IntelligenceOct-24-2023

Our work aligns more with the second approach, dataset tailored for multi-object grasping research; (ii) the aiming to maintain individual object maneuverability while development of the first Goal-Conditioned Reinforcement boosting grasp efficiency. Learning (GCRL) policy for concurrent grasping and lifting Reinforcement Learning (RL): Robots often operate of multiple objects from a table; (iii) the enhancement of in complex physical environments, making analytical the execution policy for better adaptability to unseen object solutions challenging due to noisy sensory input. RL is configurations and imprecise pre-grasp poses, achieved via commonly used for decision-making and control in these specialist distillation and curriculum learning; (iv) a comprehensive cases [4, 5, 16, 40, 41]. As a specialized form, GCRL [42] framework, MultiGrasp, that extends existing robotic focuses on skill acquisition for predefined objectives, but systems toward robust, accurate multi-object grasping.

artificial intelligence, conference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.15599

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation

Wang, Ruicheng, Zhang, Jialiang, Chen, Jiayi, Xu, Yinzhen, Li, Puhao, Liu, Tengyu, Wang, He

arXiv.org Artificial IntelligenceMar-7-2023

Robotic dexterous grasping is the first step to enable human-like dexterous object manipulation and thus a crucial robotic technology. However, dexterous grasping is much more under-explored than object grasping with parallel grippers, partially due to the lack of a large-scale dataset. In this work, we present a large-scale robotic dexterous grasp dataset, DexGraspNet, generated by our proposed highly efficient synthesis method that can be generally applied to any dexterous hand. Our method leverages a deeply accelerated differentiable force closure estimator and thus can efficiently and robustly synthesize stable and diverse grasps on a large scale. We choose ShadowHand and generate 1.32 million grasps for 5355 objects, covering more than 133 object categories and containing more than 200 diverse grasps for each object instance, with all grasps having been validated by the Isaac Gym simulator. Compared to the previous dataset from Liu et al. generated by GraspIt!, our dataset has not only more objects and grasps, but also higher diversity and quality. Via performing cross-dataset experiments, we show that training several algorithms of dexterous grasp synthesis on our dataset significantly outperforms training on the previous one. To access our data and code, including code for human and Allegro grasp synthesis, please visit our project page: https://pku-epic.github.io/DexGraspNet/.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.02697

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

GenDexGrasp: Generalizable Dexterous Grasping

Li, Puhao, Liu, Tengyu, Li, Yuyang, Geng, Yiran, Zhu, Yixin, Yang, Yaodong, Huang, Siyuan

arXiv.org Artificial IntelligenceMar-6-2023

Generating dexterous grasping has been a long-standing and challenging robotic task. Despite recent progress, existing methods primarily suffer from two issues. First, most prior arts focus on a specific type of robot hand, lacking the generalizable capability of handling unseen ones. Second, prior arts oftentimes fail to rapidly generate diverse grasps with a high success rate. To jointly tackle these challenges with a unified solution, we propose GenDexGrasp, a novel hand-agnostic grasping algorithm for generalizable grasping. GenDexGrasp is trained on our proposed large-scale multi-hand grasping dataset MultiDex synthesized with force closure optimization. By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands. Compared with previous methods, GenDexGrasp achieves a three-way trade-off among success rate, inference speed, and diversity. Code is available at https://github.com/tengyu-liu/GenDexGrasp.

artificial intelligence, contact map, representation, (14 more...)

arXiv.org Artificial Intelligence

2210.00722

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback