HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

Dong, Zichao, Zhang, Weikun, Huang, Xufeng, Ji, Hang, Zhan, Xin, Chen, Junbo

Aug-23-2023–arXiv.org Artificial Intelligence

Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in https://github.com/dzcgaara/HuBo-VLM.

dataset, instruction, language model, (12 more...)

arXiv.org Artificial Intelligence

Aug-23-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Europe
  - Poland (0.04)
  - Switzerland > Zürich
    - Zürich (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Humanoid Robots (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found