Dexterous Robotic Piano Playing at Scale

Chen, Le, Zhao, Yi, Schneider, Jan, Gao, Quankai, Guist, Simon, Qian, Cheng, Kannala, Juho, Schölkopf, Bernhard, Pajarinen, Joni, Büchler, Dieter

arXiv.org Artificial Intelligence 

This work has been submitted to the IEEE for possible publication. Abstract--Endowing robot hands with human-level dexterity has been a long-standing goal in robotics. Bimanual robotic piano playing represents a particularly challenging task: it is high-dimensional, contact-rich, and requires fast, precise control. Our approach is built on three core components. First, we introduce an automatic fingering strategy based on Optimal Transport (OT), allowing the agent to autonomously discover efficient piano-playing strategies from scratch without demonstrations. Second, we conduct large-scale Reinforcement Learning (RL) by training more than 2,000 agents, each specialized in distinct music pieces, and aggregate their experience into a dataset named RP1M++, consisting of over one million trajectories for robotic piano playing. Extensive experiments and ablation studies highlight the effectiveness and scalability of our approach, advancing dexterous robotic piano playing at scale. Achieving human-level dexterity remains one of the central challenges in robotics. The difficulty stems from the breadth of challenges ranging from contact-rich manipulation to dynamic athletic tasks, each posing distinct demands. Manipulation tasks, such as grasping or reorienting objects [1], require sustained application of appropriate forces at moderate speeds across objects with diverse shapes, materials, and weight distributions. Dynamic tasks, such as juggling [2] or table tennis [3], involve frequent contact changes, demand high precision, and allow little tolerance for error due to the rarity of contact opportunities. The combination of requiring both precision and speed makes reproducing human-level dexterity particularly challenging. Q. Gao is with the University of Southern California, CA 90007, United States (e-mail: quankaig@usc.edu). Q. Cheng is with Imperial College London, SW7 2AZ, London, United Kingdom (e-mail: c.qian24@imperial.ac.uk). J. Kannala is with the University of Oulu, 90570 Oulu, Finland. D. B uchler is also with the University of Alberta (Canada), the Alberta Machine Intelligence Institute (Amii), & holds a Canada CIFAR AI Chair.