IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

Moon, Seungwhan, Madotto, Andrea, Lin, Zhaojiang, Dirafzoon, Alireza, Saraf, Aparajita, Bearman, Amy, Damavandi, Babak

Oct-25-2022–arXiv.org Artificial Intelligence

ABSTRACT We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos - while preserving the transitivity across these modalities. We explore several new IMU-based applications that IMU2CLIP enables, such as motion-based media retrieval and natural language reasoning tasks with motion data. In addition, we show that IMU2CLIP can significantly improve the downstream performance when fine-tuned for each application Figure 1: Illustration of IMU2CLIP (I2C): (a) The model aligns (e.g. Our code trained, IMU2CLIP is used as a retriever for both (b) IMU will be made publicly available.

imu2clip, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-25-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Industry:
- Commercial Services & Supplies > Security & Alarm Services (0.60)
- Information Technology > Smart Houses & Appliances (0.60)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)
  - Natural Language (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found