AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

Tian, Beitong, Zhao, Lingzhi, Chen, Bo, Zheng, Haozhen, Yang, Jingcheng, Wu, Mingyuan, Vasisht, Deepak, Nahrstedt, Klara

Oct-28-2025–arXiv.org Artificial Intelligence

Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the VLM and transmission, incorporating error-resilient fine-tuning to improve the system's robustness to transmission errors. We develop a VR simulator to enable users to experience AquaVLM in a realistic underwater environment and create a fully functional prototype on the iOS platform for real-world experiments. Both subjective and objective evaluations validate the effectiveness of AquaVLM and highlight its potential for personal underwater communication as well as broader mobile VLM applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report (1.00)

Industry:
- Government > Military (1.00)
- Information Technology (0.93)

Technology:
- Information Technology
  - Communications > Mobile (1.00)
  - Human Computer Interaction > Interfaces
    - Virtual Reality (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.69)
    - Machine Learning > Neural Networks
      - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found