AITopics | Fang, Bin

Collaborating Authors

Fang, Bin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tacchi 2.0: A Low Computational Cost and Comprehensive Dynamic Contact Simulator for Vision-based Tactile Sensors

Sun, Yuhao, Zhang, Shixin, Li, Wenzhuang, Zhao, Jie, Shan, Jianhua, Shen, Zirong, Chen, Zixi, Sun, Fuchun, Guo, Di, Fang, Bin

arXiv.org Artificial IntelligenceMar-12-2025

With the development of robotics technology, some tactile sensors, such as vision-based sensors, have been applied to contact-rich robotics tasks. However, the durability of vision-based tactile sensors significantly increases the cost of tactile information acquisition. Utilizing simulation to generate tactile data has emerged as a reliable approach to address this issue. While data-driven methods for tactile data generation lack robustness, finite element methods (FEM) based approaches require significant computational costs. To address these issues, we integrated a pinhole camera model into the low computational cost vision-based tactile simulator Tacchi that used the Material Point Method (MPM) as the simulated method, completing the simulation of marker motion images. We upgraded Tacchi and introduced Tacchi 2.0. This simulator can simulate tactile images, marked motion images, and joint images under different motion states like pressing, slipping, and rotating. Experimental results demonstrate the reliability of our method and its robustness across various vision-based tactile sensors.

artificial intelligence, machine learning, simulation, (15 more...)

arXiv.org Artificial Intelligence

2503.091

Country:

Asia > China (0.16)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Feng, Ruoxuan, Hu, Jiangyu, Xia, Wenke, Gao, Tianci, Shen, Ao, Sun, Yuhao, Fang, Bin, Hu, Di

arXiv.org Artificial IntelligenceFeb-15-2025

Visuo-tactile sensors aim to emulate human tactile perception, enabling robots to precisely understand and manipulate objects. Over time, numerous meticulously designed visuo-tactile sensors have been integrated into robotic systems, aiding in completing various tasks. However, the distinct data characteristics of these low-standardized visuo-tactile sensors hinder the establishment of a powerful tactile perception system. We consider that the key to addressing this issue lies in learning unified multi-sensor representations, thereby integrating the sensors and promoting tactile knowledge transfer between them. To achieve unified representation of this nature, we introduce TacQuad, an aligned multi-modal multi-sensor tactile dataset from four different visuo-tactile sensors, which enables the explicit integration of various sensors. Recognizing that humans perceive the physical environment by acquiring diverse tactile information such as texture and pressure changes, we further propose to learn unified multi-sensor representations from both static and dynamic perspectives. By integrating tactile images and videos, we present AnyTouch, a unified static-dynamic multi-sensor representation learning framework with a multi-level structure, aimed at both enhancing comprehensive perceptual abilities and enabling effective cross-sensor transfer. This multi-level architecture captures pixel-level details from tactile data via masked modeling and enhances perception and transferability by learning semantic-level sensor-agnostic features through multi-modal alignment and cross-sensor matching. We provide a comprehensive analysis of multi-sensor transferability, and validate our method on various datasets and in the real-world pouring task. Experimental results show that our method outperforms existing methods, exhibits outstanding static and dynamic perception capabilities across various sensors.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.12191

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

RoboBERT: An End-to-end Multimodal Robotic Manipulation Model

Wang, Sicheng, Shan, Jianhua, Zhang, Jianwei, Gao, Haozhang, Han, Hailiang, Chen, Yipeng, Wei, Kang, Zhang, Chengkun, Wong, Kairos, Zhao, Jie, Zhao, Lei, Fang, Bin

arXiv.org Artificial IntelligenceFeb-10-2025

Embodied intelligence integrates multiple modalities, enabling agents to understand images, language, and actions simultaneously. However, existing models always depend on additional datasets or extensive pre-training to maximize performance improvements, consuming abundant training time and expensive hardware cost. To tackle this issue, we present RoboBERT, a novel end-to-end robotic manipulation model integrated with a unique training strategy. This model utilizes a CNN-based diffusion policy, enhancing and stabilizing the effectiveness of this model by separating training processes for different modalities. It also underscores the importance of data augmentation, verifying various techniques to significantly boost performance. Unlike models that depend on extra data or large foundation models, RoboBERT achieves a highly competitive success rate while using only language-labeled expert demonstrations and maintaining a relatively smaller model size. Specifically, RoboBERT achieves an average length of 4.52 on the CALVIN benchmark for \(ABCD \rightarrow D\) task, setting a new state-of-the-art (SOTA) record. Furthermore, when tested on a real robot, the model demonstrates superior performance, achieving a higher success rate than other methods trained with the same data. We propose that these concepts and methodologies of RoboBERT demonstrate extensive versatility and compatibility, contributing significantly to the development of lightweight multimodal robotic models. The code can be accessed on https://github.com/PeterWangsicheng/RoboBERT

artificial intelligence, arxiv preprint arxiv, augmentation, (16 more...)

arXiv.org Artificial Intelligence

2502.07837

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset

Cheng, Ning, Li, You, Gao, Jing, Fang, Bin, Xu, Jinan, Han, Wenjuan

arXiv.org Artificial IntelligenceJun-17-2024

Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-language-vision dataset named TLV (Touch-Language-Vision) by human-machine cascade collaboration, featuring sentence-level descriptions for multimode alignment. The new dataset is used to fine-tune our proposed lightweight training framework, STLV-Align (Synergistic Touch-Language-Vision Alignment), achieving effective semantic alignment with minimal parameter adjustments (1%).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.09813

Country: Asia > China (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective

Li, Shoujie, Wang, Zihan, Wu, Changsheng, Li, Xiang, Luo, Shan, Fang, Bin, Sun, Fuchun, Zhang, Xiao-Ping, Ding, Wenbo

arXiv.org Artificial IntelligenceJun-17-2024

Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented, but few of them discussed the significance of signal processing methods to visuotactile sensors. Apart from ingenious hardware design, the full potential of the sensory system toward designated tasks can only be released with the appropriate signal processing methods. Therefore, this paper provides a comprehensive review of visuotactile sensors from the perspective of signal processing methods and outlooks possible future research directions for visuotactile sensors.

artificial intelligence, machine learning, sensor, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSTSP.2024.3416841

2406.12226

Country:

Asia > China (0.69)
North America > United States (0.67)
North America > Canada > Ontario (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas > Upstream (0.92)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation

Cheng, Ning, Guan, Changhao, Gao, Jing, Wang, Weihao, Li, You, Meng, Fandong, Zhou, Jie, Fang, Bin, Xu, Jinan, Han, Wenjuan

arXiv.org Artificial IntelligenceJun-6-2024

Touch holds a pivotal position in enhancing the perceptual and interactive capabilities of both humans and robots. Despite its significance, current tactile research mainly focuses on visual and tactile modalities, overlooking the language domain. Inspired by this, we construct Touch100k, a paired touch-language-vision dataset at the scale of 100k, featuring tactile sensation descriptions in multiple granularities (i.e., sentence-level natural expressions with rich semantics, including contextual and dynamic relationships, and phrase-level descriptions capturing the key features of tactile sensations). Based on the dataset, we propose a pre-training method, Touch-Language-Vision Representation Learning through Curriculum Linking (TLV-Link, for short), inspired by the concept of curriculum learning. TLV-Link aims to learn a tactile representation for the GelSight sensor and capture the relationship between tactile, language, and visual modalities. We evaluate our representation's performance across two task categories (namely, material property identification and robot grasping prediction), focusing on tactile representation and zero-shot touch understanding. The experimental evaluation showcases the effectiveness of our representation. By enabling TLV-Link to achieve substantial improvements and establish a new state-of-the-art in touch-centric multimodal representation learning, Touch100k demonstrates its value as a valuable resource for research. Project page: https://cocacola-lab.github.io/Touch100k/.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.03813

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)

Add feedback

Transformer in Touch: A Survey

Gao, Jing, Cheng, Ning, Fang, Bin, Han, Wenjuan

arXiv.org Artificial IntelligenceMay-21-2024

The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception. This review aims to comprehensively outline the application and development of Transformers in tactile technology. We first introduce the two fundamental concepts behind the success of the Transformer: the self-attention mechanism and large-scale pre-training. Then, we delve into the application of Transformers in various tactile tasks, including but not limited to object recognition, cross-modal generation, and object manipulation, offering a concise summary of the core methodologies, performance benchmarks, and design highlights. Finally, we suggest potential areas for further research and future work, aiming to generate more interest within the community, tackle existing challenges, and encourage the use of Transformer models in the tactile field.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.12779

Country:

Asia > China (0.28)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile Sensor

Shan, Jianhua, Sun, Yuhao, Zhang, Shixin, Sun, Fuchun, Chen, Zixi, Shen, Zirong, Stefanini, Cesare, Yang, Yiyong, Luo, Shan, Fang, Bin

arXiv.org Artificial IntelligenceMay-12-2024

Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real transfer. We propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. For a more realistic contact simulation, a new simulation environment including elastic, plastic, and elastoplastic deformations is created. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. Finally, we build a real experimental platform to complete the sim-to-real transfer and achieve a 90% success rate on difficult tasks such as cylinder and sphere. To test the robustness of our method, we use plasticine of different hardness and sizes to repeat the tasks including cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method.

artificial intelligence, machine learning, simulation, (15 more...)

arXiv.org Artificial Intelligence

2405.07237

Country: Europe > Italy > Tuscany (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (0.57)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Simulation of Optical Tactile Sensors Supporting Slip and Rotation using Path Tracing and IMPM

Shen, Zirong, Sun, Yuhao, Zhang, Shixin, Chen, Zixi, Sun, Heyi, Sun, Fuchun, Fang, Bin

arXiv.org Artificial IntelligenceMay-5-2024

Optical tactile sensors are extensively utilized in intelligent robot manipulation due to their ability to acquire high-resolution tactile information at a lower cost. However, achieving adequate reality and versatility in simulating optical tactile sensors is challenging. In this paper, we propose a simulation method and validate its effectiveness through experiments. We utilize path tracing for image rendering, achieving higher similarity to real data than the baseline method in simulating pressing scenarios. Additionally, we apply the improved Material Point Method(IMPM) algorithm to simulate the relative rest between the object and the elastomer surface when the object is in motion, enabling more accurate simulation of complex manipulations such as slip and rotation.

artificial intelligence, sensor, simulation, (16 more...)

arXiv.org Artificial Intelligence

2405.02914

Country:

Asia (0.30)
North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.49)

Add feedback

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Li, Dingzhe, Jin, Yixiang, A, Yong, Yu, Hongze, Shi, Jun, Hao, Xiaoshuai, Hao, Peng, Liu, Huaping, Sun, Fuchun, Fang, Bin

arXiv.org Artificial IntelligenceApr-28-2024

The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.18201

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback