GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Bimbraw, Keshav, Wang, Ye, Liu, Jing, Koike-Akino, Toshiaki

Jul-15-2024–arXiv.org Artificial Intelligence

Abstract--Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without finetuning is often limited in specialized tasks. However, full finetuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning. ARGE language models (LLMs) [1], such as generative pre-trained transformers (GPTs) [2], have recently emerged as powerful general assistance tools and exhibited tremendous capabilities in a wide range of applications. LLMs are often configured with billions of parameters to capture linguistic patterns and semantic relationships in natural language processing, enabling text generation, summarization, translation, reasoning, question-answering, etc.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report (0.64)

Industry:
- Health & Medicine
  - Diagnostic Medicine > Imaging (0.70)
  - Nuclear Medicine (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found