IFShip: A Large Vision-Language Model for Interpretable Fine-grained Ship Classification via Domain Knowledge-Enhanced Instruction Tuning

Guo, Mingning, Wu, Mengwei, Shen, Yuxiang, Li, Haifeng, Tao, Chao

Aug-13-2024–arXiv.org Artificial Intelligence

End-to-end interpretation is currently the prevailing paradigm for remote sensing fine-grained ship classification (RS-FGSC) task. However, its inference process is uninterpretable, leading to criticism as a black box model. To address this issue, we propose a large vision-language model (LVLM) named IFShip for interpretable fine-grained ship classification. Unlike traditional methods, IFShip excels in interpretability by accurately conveying the reasoning process of FGSC in natural language. Specifically, we first design a domain knowledge-enhanced Chain-of-Thought (COT) prompt generation mechanism. This mechanism is used to semi-automatically construct a task-specific instruction-following dataset named TITANIC-FGS, which emulates human-like logical decision-making. We then train the IFShip model using task instructions tuned with the TITANIC-FGS dataset. Building on IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results reveal that the proposed method surpasses state-of-the-art FGSC algorithms in both classification interpretability and accuracy. Moreover, compared to LVLMs like LLaVA and MiniGPT-4, our approach demonstrates superior expertise in the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.

dataset, fgsc task, ifship, (15 more...)

arXiv.org Artificial Intelligence

Aug-13-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- Europe > Switzerland
  - Zürich > Zürich (0.14)

Genre:
- Research Report (1.00)
- Instructional Material (1.00)

Industry:
- Transportation > Marine (1.00)
- Government > Military
  - Navy (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.90)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found