AITopics | Vo, Thieu

Collaborating Authors

Vo, Thieu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications

Nguyen, Nghia, Vu, Minh Nhat, Ta, Tung D., Huang, Baoru, Vo, Thieu, Le, Ngan, Nguyen, Anh

arXiv.org Artificial IntelligenceSep-26-2024

Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natural language understanding. However, CLIP was trained solely on static images paired with text prompts and has not yet been fully adapted for robotic tasks involving dynamic actions. In this paper, we introduce Robotic-CLIP to enhance robotic perception capabilities. We first gather and label large-scale action data, and then build our Robotic-CLIP by fine-tuning CLIP on 309,433 videos (~7.4 million frames) of action data using contrastive learning. By leveraging action data, Robotic-CLIP inherits CLIP's strong image performance while gaining the ability to understand actions in robotic contexts. Intensive experiments show that our Robotic-CLIP outperforms other CLIP-based models across various language-driven robotic tasks. Additionally, we demonstrate the practical effectiveness of Robotic-CLIP in real-world grasping applications.

application, artificial intelligence, robotic-clip, (16 more...)

arXiv.org Artificial Intelligence

2409.17727

Country:

Asia (0.68)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation

Van Vo, Tuan, Vu, Minh Nhat, Huang, Baoru, Nguyen, Toan, Le, Ngan, Vo, Thieu, Nguyen, Anh

arXiv.org Artificial IntelligenceSep-19-2023

Affordance detection presents intricate challenges and has a wide range of robotic applications. Previous works have faced limitations such as the complexities of 3D object shapes, the wide range of potential affordances on real-world objects, and the lack of open-vocabulary support for affordance understanding. In this paper, we introduce a new open-vocabulary affordance detection method in 3D point clouds, leveraging knowledge distillation and text-point correlation. Our approach employs pre-trained 3D models through knowledge distillation to enhance feature extraction and semantic understanding in 3D point clouds. We further introduce a new text-point correlation method to learn the semantic links between point cloud features and open-vocabulary labels. The intensive experiments show that our approach outperforms previous works and adapts to new affordance labels and unseen objects. Notably, our method achieves the improvement of 7.96% mIOU score compared to the baselines. Furthermore, it offers real-time inference which is well-suitable for robotic manipulation applications.

artificial intelligence, knowledge distillation and text-point correlation, open-vocabulary affordance detection

arXiv.org Artificial Intelligence

2309.10932

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (0.73)

Add feedback

Language-Conditioned Affordance-Pose Detection in 3D Point Clouds

Nguyen, Toan, Vu, Minh Nhat, Huang, Baoru, Van Vo, Tuan, Truong, Vy, Le, Ngan, Vo, Thieu, Le, Bac, Nguyen, Anh

arXiv.org Artificial IntelligenceSep-19-2023

Affordance detection and pose estimation are of great importance in many robotic applications. Their combination helps the robot gain an enhanced manipulation capability, in which the generated pose can facilitate the corresponding affordance task. Previous methods for affodance-pose joint learning are limited to a predefined set of affordances, thus limiting the adaptability of robots in real-world environments. In this paper, we propose a new method for language-conditioned affordance-pose joint learning in 3D point clouds. Given a 3D point cloud object, our method detects the affordance region and generates appropriate 6-DoF poses for any unconstrained affordance label. Our method consists of an open-vocabulary affordance detection branch and a language-guided diffusion model that generates 6-DoF poses based on the affordance text. We also introduce a new high-quality dataset for the task of language-driven affordance-pose joint learning. Intensive experimental results demonstrate that our proposed method works effectively on a wide range of open-vocabulary affordances and outperforms other baselines by a large margin. In addition, we illustrate the usefulness of our method in real-world robotic applications. Our code and dataset are publicly available at https://3DAPNet.github.io

artificial intelligence, language-conditioned affordance-pose detection, point cloud

arXiv.org Artificial Intelligence

2309.10911

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Grasp-Anything: Large-scale Grasp Dataset from Foundation Models

Vuong, An Dinh, Vu, Minh Nhat, Le, Hieu, Huang, Baoru, Huynh, Binh, Vo, Thieu, Kugi, Andreas, Nguyen, Anh

arXiv.org Artificial IntelligenceSep-18-2023

Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.09818

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Open-Vocabulary Affordance Detection in 3D Point Clouds

Nguyen, Toan, Vu, Minh Nhat, Vuong, An, Nguyen, Dzung, Vo, Thieu, Le, Ngan, Nguyen, Anh

arXiv.org Artificial IntelligenceJul-23-2023

Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.

affordance, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.02401

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback