While vision-language integration is important for a wide range of Artificial Intelligence (AI) prototypes and applications, the notion of integration has not been established within a theoretical framework that would allow for more thorough research on the issue. In this paper, we attempt to explore the reasons that dictate this content integration by bringing together Searle's theory of intentionality, the symbol grounding problem, as well as arguments regarding the nature of images and language developed within different AI subfields. In doing so, the Double-Grounding theory emerges which provides an explanatory theoretical definition for visionlanguage integration. In correlating the need for visionlanguage integration with inherent characteristics of the integrated media and in associating this need with an agent's intentionality and intelligence, the work presented in this paper aims at providing a theoretically established --and therefore solid-- common ground for currently isolated and scattered multimedia integration research in AI subfields.
Jan-11-2006, 08:54:06 GMT