Goto

Collaborating Authors

 octopi


VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Bi, Jianxin, Ma, Kevin Yuchen, Hao, Ce, Shou, Mike Zheng, Soh, Harold

arXiv.org Artificial Intelligence

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.


Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference

Guo, Zexiang, Chen, Hengxiang, Mai, Xinheng, Qiu, Qiusang, Ma, Gan, Kappassov, Zhanat, Li, Qiang, Chen, Nutan

arXiv.org Artificial Intelligence

Inferring physical properties can significantly enhance robotic manipulation by enabling robots to handle objects safely and efficiently through adaptive grasping strategies. Previous approaches have typically relied on either tactile or visual data, limiting their ability to fully capture properties. We introduce a novel cross-modal perception framework that integrates visual observations with tactile representations within a multimodal vision-language model. Our physical reasoning framework, which employs a hierarchical feature alignment mechanism and a refined prompting strategy, enables our model to make property-specific predictions that strongly correlate with ground-truth measurements. Evaluated on 35 diverse objects, our approach outperforms existing baselines and demonstrates strong zero-shot generalization.


How I Used DALL·E 2 to Generate The Logo for OctoSQL

#artificialintelligence

Everybody has heard about the latest cool thing™, which is DALL·E 2 (henceforth called Dall-e). A few months ago, when the first previews started, it was basically everywhere. Now, a few weeks ago, the floodgates have been opened and lots of people on the waitlist got access - that group included me. I’ve spent a day playing around with it, learned some basics (like the fact that adding “artstation” to the end of your phrase automatically makes the output much better…), and generated a bunch of (even a few nice-looking) images.