Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
Yang, Juncheng, Li, Zuchao, Xie, Shuai, Yu, Wei, Li, Shijun, Du, Bo
–arXiv.org Artificial Intelligence
The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.
arXiv.org Artificial Intelligence
Apr-6-2024
- Country:
- Asia > China
- Hubei Province (0.28)
- Europe > Switzerland
- Asia > China
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science > Problem Solving (0.70)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology