Han, Yuanfeng
Design, Calibration, and Control of Compliant Force-sensing Gripping Pads for Humanoid Robots
Han, Yuanfeng, Jiang, Boren, Chirikjian, Gregory S.
This paper introduces a pair of low-cost, light-weight and compliant force-sensing gripping pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal gripping forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the gripping forces and to ensure the surface alignment between the grippers and the object. Limit surface theory is incorporated as a contact friction modeling approach to determine the magnitude of gripping forces for slippage avoidance. The integrated hardware and software system is demonstrated with a NAO humanoid robot. Experiments show the effectiveness of the overall approach.
A Learning-Based Approach for Estimating Inertial Properties of Unknown Objects from Encoder Discrepancies
Lao, Zizhou, Han, Yuanfeng, Ma, Yunshan, Chirikjian, Gregory S.
Many robots utilize commercial force/torque sensors to identify inertial properties of unknown objects. However, such sensors can be difficult to apply to small-sized robots due to their weight, size, and cost. In this paper, we propose a learning-based approach for estimating the mass and center of mass (COM) of unknown objects without using force/torque sensors at the end-effector or on the joints. In our method, a robot arm carries an unknown object as it moves through multiple discrete configurations. Measurements are collected when the robot reaches each discrete configuration and stops. A neural network is designed to estimate joint torques from encoder discrepancies. Given multiple samples, we derive the closed-form relation between joint torques and the object's inertial properties. Based on the derivation, the mass and COM of object are identified by weighted least squares. In order to improve the accuracy of inferred inertial properties, an attention model is designed to generate weights of joints, which indicate the relative importance for each joint. Our framework requires only encoder measurements without using any force/torque sensors, but still maintains accurate estimation capability. The proposed approach has been demonstrated on a 4 degree of freedom (DOF) robot arm.
Model-Free and Learning-Free Proprioceptive Humanoid Movement Control
Jiang, Boren, Tao, Ximeng, Han, Yuanfeng, Li, Wanze, Chirikjian, Gregory S.
This paper presents a novel model-free method for humanoid-robot quasi-static movement control. Traditional model-based methods often require precise robot model parameters. Additionally, existing learning-based frameworks often train the policy in simulation environments, thereby indirectly relying on a model. In contrast, we propose a proprioceptive framework based only on sensory outputs. It does not require prior knowledge of a robot's kinematic model or inertial parameters. Our method consists of three steps: 1. Planning different pairs of center of pressure (CoP) and foot position objectives within a single cycle. 2. Searching around the current configuration by slightly moving the robot's leg joints back and forth while recording the sensor measurements of its CoP and foot positions. 3. Updating the robot motion with an optimization algorithm until all objectives are achieved. We demonstrate our approach on a NAO humanoid robot platform. Experiment results show that it can successfully generate stable robot motions.
The Curious Robot: Learning Visual Representations via Physical Interactions
Pinto, Lerrel, Gandhi, Dhiraj, Han, Yuanfeng, Park, Yong-Lae, Gupta, Abhinav
What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation learning does not require millions of semantic labels. We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). For example, babies push objects, poke them, put them in their mouth and throw them to learn representations. Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and observes objects in a tabletop environment. It uses four different types of physical interactions to collect more than 130K datapoints, with each datapoint providing supervision to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Quantitatively, we evaluate our learned ConvNet on image classification tasks and show improvements compared to learning without external data. Finally, on the task of instance retrieval, our network outperforms the ImageNet network on recall@1 by 3%