Goto

Collaborating Authors

 tactile image


Single-Pixel Tactile Skin via Compressive Sampling

Slepyan, Ariel, Xing, Laura, Zhang, Rudy, Thakor, Nitish

arXiv.org Artificial Intelligence

Development of large-area, high-speed electronic skins is a grand challenge for robotics, prosthetics, and human-machine interfaces, but is fundamentally limited by wiring complexity and data bottlenecks. Here, we introduce Single-Pixel Tactile Skin (SPTS), a paradigm that uses compressive sampling to reconstruct rich tactile information from an entire sensor array via a single output channel. This is achieved through a direct circuit-level implementation where each sensing element, equipped with a miniature microcontroller, contributes a dynamically weighted analog signal to a global sum, performing distributed compressed sensing in hardware. Our flexible, daisy-chainable design simplifies wiring to a few input lines and one output, and significantly reduces measurement requirements compared to raster scanning methods. We demonstrate the system's performance by achieving object classification at an effective 3500 FPS and by capturing transient dynamics, resolving an 8 ms projectile impact into 23 frames. A key feature is the support for adaptive reconstruction, where sensing fidelity scales with measurement time. This allows for rapid contact localization using as little as 7% of total data, followed by progressive refinement to a high-fidelity image - a capability critical for responsive robotic systems. This work offers an efficient pathway towards large-scale tactile intelligence for robotics and human-machine interfaces.


Residual Rotation Correction using Tactile Equivariance

Zhu, Yizhe, Ye, Zhang, Hu, Boce, Zhao, Haibo, Qi, Yu, Wang, Dian, Platt, Robert

arXiv.org Artificial Intelligence

However, the high cost of tactile data collection makes sample efficiency the key requirement for developing visuotactile policies. We present EquiT ac, a framework that exploits the inherent SO(2) symmetry of in-hand object rotation to improve sample efficiency and generalization for visuotactile policy learning. EquiT ac first reconstructs surface normals from raw RGB inputs of vision-based tactile sensors, so rotations of the normal vector field correspond to in-hand object rotations. An SO(2)- equivariant network then predicts a residual rotation action that augments a base visuomotor policy at test time, enabling real-time rotation correction without additional reorientation demonstrations. On a real robot, EquiT ac accurately achieves robust zero-shot generalization to unseen in-hand orientations with very few training samples, where baselines fail even with more training data. T o our knowledge, this is the first tactile learning method to explicitly encode tactile equivari-ance for policy learning, yielding a lightweight, symmetry-aware module that improves reliability in contact-rich tasks.


Phy-Tac: Toward Human-Like Grasping via Physics-Conditioned Tactile Goals

Lyu, Shipeng, Sheng, Lijie, Wang, Fangyuan, Zhang, Wenyao, Lin, Weiwei, Jia, Zhenzhong, Navarro-Alarcon, David, Guo, Guodong

arXiv.org Artificial Intelligence

Abstract--Humans naturally grasp objects with minimal level required force for stability, whereas robots often rely on rigid, over-squeezing control. T o narrow this gap, we propose a human-inspired physics-conditioned tactile method (Phy-T ac) for force-optimal stable grasping (FOSG) that unifies pose selection, tactile prediction, and force regulation. A physics-based pose selector first identifies feasible contact regions with optimal force distribution based on surface geometry. Then, a physics-conditioned latent diffusion model (Phy-LDM) predicts the tactile imprint under FOSG target. Last, a latent-space LQR controller drives the gripper toward this tactile imprint with minimal actuation, preventing unnecessary compression. Trained on a physics-conditioned tactile dataset covering diverse objects and contact conditions, the proposed Phy-LDM achieves superior tactile prediction accuracy, while the Phy-T ac outperforms fixed-force and GraspNet-based baselines in grasp stability and force efficiency. Experiments on classical robotic platforms demonstrate force-efficient and adaptive manipulation that bridges the gap between robotic and human grasping.


NeuralTouch: Neural Descriptors for Precise Sim-to-Real Tactile Robot Control

Lin, Yijiong, Deng, Bowen, Lu, Chenghua, Yang, Max, Psomopoulou, Efi, Lepora, Nathan F.

arXiv.org Artificial Intelligence

Abstract--Grasping accuracy is a critical prerequisite for precise object manipulation, often requiring careful alignment between the robot hand and object. Neural Descriptor Fields (NDF) offer a promising vision-based method to generate grasping poses that generalize across object categories. However, NDF alone can produce inaccurate poses due to imperfect camera calibration, incomplete point clouds, and object variability. Meanwhile, tactile sensing enables more precise contact, but existing approaches typically learn policies limited to simple, predefined contact geometries. In this work, we introduce NeuralT ouch, a multi-modal framework that integrates NDF and tactile sensing to enable accurate, generalizable grasping through gentle physical interaction. Our approach leverages NDF to implicitly represent the target contact geometry, from which a deep reinforcement learning (RL) policy is trained to refine the grasp using tactile feedback. This policy is conditioned on the neural descriptors and does not require explicit specification of contact types. Results show that NeuralT ouch significantly improves grasping accuracy and robustness over baseline methods, offering a general framework for precise, contact-rich robotic manipulation. I. INTRODUCTION A commonplace behaviour in humans is our ability to glance at an object to determine its general position and then use touch alone to grasp it with precision.


Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation

Helmut, Erik, Funk, Niklas, Schneider, Tim, de Farias, Cristiana, Peters, Jan

arXiv.org Artificial Intelligence

Contact-rich manipulation depends on applying the correct grasp forces throughout the manipulation task, especially when handling fragile or deformable objects. Most existing imitation learning approaches often treat visuotactile feedback only as an additional observation, leaving applied forces as an uncontrolled consequence of gripper commands. In this work, we present Force-Aware Robotic Manipulation (FARM), an imitation learning framework that integrates high-dimensional tactile data to infer tactile-conditioned force signals, which in turn define a matching force-based action space. We collect human demonstrations using a modified version of the handheld Universal Manipulation Interface (UMI) gripper that integrates a GelSight Mini visual tactile sensor. For deploying the learned policies, we developed an actuated variant of the UMI gripper with geometry matching our handheld version. During policy rollouts, the proposed FARM diffusion policy jointly predicts robot pose, grip width, and grip force. FARM outperforms several baselines across three tasks with distinct force requirements -- high-force, low-force, and dynamic force adaptation -- demonstrating the advantages of its two key components: leveraging force-grounded, high-dimensional tactile observations and a force-based control space. The codebase and design files are open-sourced and available at https://tactile-farm.github.io .


TacRefineNet: Tactile-Only Grasp Refinement Between Arbitrary In-Hand Object Poses

Wang, Shuaijun, Zhou, Haoran, Xiang, Diyun, You, Yangwei

arXiv.org Artificial Intelligence

Abstract--Despite progress in both traditional dexterous grasping pipelines and recent Vision-Language-Action (VLA) approaches, the grasp execution stage remains prone to pose inaccuracies, especially in long-horizon tasks, which undermines overall performance. T o address this "last-mile" challenge, we propose T acRefineNet, a tactile-only framework that achieves fine in-hand pose refinement of known objects in arbitrary target poses using multi-finger fingertip sensing. Our method iteratively adjusts the end-effector pose based on tactile feedback, aligning the object to the desired configuration. We design a multi-branch policy network that fuses tactile inputs from multiple fingers along with proprioception to predict precise control updates. T o train this policy, we combine large-scale simulated data from a physics-based tactile model in MuJoCo with real-world data collected from a physical system. Comparative experiments show that pretraining on simulated data and fine-tuning with a small amount of real data significantly improves performance over simulation-only training. T o our knowledge, this is the first method to enable arbitrary in-hand pose refinement via multi-finger tactile sensing alone. Project website is available at https://sites.google.com/view/tacrefinenet


UniTac2Pose: A Unified Approach Learned in Simulation for Category-level Visuotactile In-hand Pose Estimation

Wu, Mingdong, Yang, Long, Liu, Jin, Huang, Weiyao, Wu, Lehong, Chen, Zelin, Ma, Daolin, Dong, Hao

arXiv.org Artificial Intelligence

Accurate estimation of the in-hand pose of an object based on its CAD model is crucial in both industrial applications and everyday tasks, ranging from positioning workpieces and assembling components to seamlessly inserting devices like USB connectors. While existing methods often rely on regression, feature matching, or registration techniques, achieving high precision and generalizability to unseen CAD models remains a significant challenge. In this paper, we propose a novel three-stage framework for in-hand pose estimation. The first stage involves sampling and pre-ranking pose candidates, followed by iterative refinement of these candidates in the second stage. In the final stage, post-ranking is applied to identify the most likely pose candidates. These stages are governed by a unified energy-based diffusion model, which is trained solely on simulated data. This energy model simultaneously generates gradients to refine pose estimates and produces an energy scalar that quantifies the quality of the pose estimates. Additionally, borrowing the idea from the computer vision domain, we incorporate a render-compare architecture within the energy-based score network to significantly enhance sim-to-real performance, as demonstrated by our ablation studies. We conduct comprehensive experiments to show that our method outperforms conventional baselines based on regression, matching, and registration techniques, while also exhibiting strong intra-category generalization to previously unseen CAD models. Moreover, our approach integrates tactile object pose estimation, pose tracking, and uncertainty estimation into a unified framework, enabling robust performance across a variety of real-world conditions.


Surformer v2: A Multimodal Classifier for Surface Understanding from Touch and Vision

Kansana, Manish, Penchala, Sindhuja, Rahimi, Shahram, Golilarz, Noorbakhsh Amiri

arXiv.org Artificial Intelligence

Multimodal surface material classification plays a critical role in advancing tactile perception for robotic manipulation and interaction. In this paper, we present Surformer v2, an enhanced multi-modal classification architecture designed to integrate visual and tactile sensory streams through a late(decision level) fusion mechanism. Building on our earlier Surformer v1 framework [1], which employed handcrafted feature extraction followed by mid-level fusion architecture with multi-head cross-attention layers, Surformer v2 integrates the feature extraction process within the model itself and shifts to late fusion. The vision branch leverages a CNN-based classifier(Efficient V-Net), while the tactile branch employs an encoder-only transformer model, allowing each modality to extract modality-specific features optimized for classification. Rather than merging feature maps, the model performs decision-level fusion by combining the output logits using a learnable weighted sum, enabling adaptive emphasis on each modality depending on data context and training dynamics. We evaluate Surformer v2 on the Touch and Go dataset [2], a multi-modal benchmark comprising surface images and corresponding tactile sensor readings. Our results demonstrate that Surformer v2 performs well, maintaining competitive inference speed, suitable for real-time robotic applications. These findings underscore the effectiveness of decision-level fusion and transformer-based tactile modeling for enhancing surface understanding in multi-modal robotic perception.


Classification of Vision-Based Tactile Sensors: A Review

Li, Haoran, Lin, Yijiong, Lu, Chenghua, Yang, Max, Psomopoulou, Efi, Lepora, Nathan F

arXiv.org Artificial Intelligence

-- Vision-based tactile sensors (VBTS) have gained widespread application in robotic hands, grippers and prosthetics due to their high spatial resolution, low manufacturing costs, and ease of customization. While VBTSs have common design features, such as a camera module, they can differ in a rich diversity of sensing principles, material compositions, multimodal approaches, and data interpretation methods. Here, we propose a novel classification of VBTS that categorizes the technology into two primary sensing principles based on the underlying transduction of contact into a tactile image: the Marker-Based Transduction Principle and the Intensity-Based Transduction Principle. Marker-Based Transduction interprets tactile information by detecting marker displacement and changes in marker density. Depending on the design of the contact module, Marker-Based Transduction can be further divided into two subtypes: Simple Marker-Based (SMB) and Morphological Marker-Based (MMB) mechanisms. Similarly, the Intensity-Based Transduction Principle encompasses the Reflective Layer-based (RLB) and Transparent Layer-Based (TLB) mechanisms. This paper provides a comparative study of the hardware characteristics of these four types of sensors including various combination types, and discusses the commonly used methods for interpreting tactile information. This comparison reveals some current challenges faced by VBTS technology and directions for future research. In robotic systems, tactile sensing is fundamental for enabling robots to interact with their environment through physical contact. By delivering real-time tactile feedback, such as object stiffness, local force, slip and contact position feedback, this capability empowers robotic systems to achieve precise object manipulation while preventing damage [1]-[4]. CL, HL and YL were supported by the the China Scholarship Council and Bristol joint scholarship. EP and NL were supported by the Horizon Europe research and innovation program under grant agreement No. 101120823 (MANiBOT) and the Royal Society International Collaboration Awards (South Korea). NL was also supported by an award from ARIA on'Democratising Hardware And Control For Robot Dexterity'. Lepora) HL is with School of Robotics, Xi'an Jiaotong-Liverpool University, China, and was with the School of Engineering Mathematics and T ech-nology, and Bristol Robotics Laboratory, University of Bristol, Bristol, U.K. (Email: haoran.li@xjtlu.edu.cn). YL, CL, MY, EP, and NL are with the School of Engineering Mathematics and T echnology, and Bristol Robotics Laboratory, University of Bristol, Bristol, U.K. (Email: {yijiong.lin, Traditional electronic technologies such as piezoelectric and piezoresistive sensor arrays have been considered promising due to their high temporal resolution and thin profiles.


SimShear: Sim-to-Real Shear-based Tactile Servoing

Freud, Kipp McAdam, Lin, Yijiong, Lepora, Nathan F.

arXiv.org Artificial Intelligence

We present SimShear, a sim-to-real pipeline for tactile control that enables the use of shear information without explicitly modeling shear dynamics in simulation. Shear, arising from lateral movements across contact surfaces, is critical for tasks involving dynamic object interactions but remains challenging to simulate. To address this, we introduce shPix2pix, a shear-conditioned U-Net GAN that transforms simulated tactile images absent of shear, together with a vector encoding shear information, into realistic equivalents with shear deformations. This method outperforms baseline pix2pix approaches in simulating tactile images and in pose/shear prediction. We apply SimShear to two control tasks using a pair of low-cost desktop robotic arms equipped with a vision-based tactile sensor: (i) a tactile tracking task, where a follower arm tracks a surface moved by a leader arm, and (ii) a collaborative co-lifting task, where both arms jointly hold an object while the leader follows a prescribed trajectory. Our method maintains contact errors within 1 to 2 mm across varied trajectories where shear sensing is essential, validating the feasibility of sim-to-real shear modeling with rigid-body simulators and opening new directions for simulation in tactile robotics.