The research presented in this paper is primarily concerned with the use of complementary textual resources in video and image analysis for supporting a higher level of automatic (semantic) annotation and indexing of images and videos. While in the past various projects (like the MUMIS project, see below for more details) used Information Extraction as the main mean for extracting relevant entities, relation and events from text that could be used for the indexing of images and videos in a specific domain, nowadays we can build on Semantic Web technologies and resources for detecting instances of semantic classes and relations in textual documents, and use those for supporting the semantic annotation and indexing of audiovisual content.
What are the critical technical challenges in multimedia information extraction (MMIE)? There are several challenges, on several fronts. Some of these include: - Detecting events of interest in video where there is no accompanying sound or text; examples include surveillance video. Further advances in computer vision, perhaps combining multiple 2D views are necessary. It is interesting to note that in the UK, it is almost impossible to walk outside for 5 minutes without being captured by some surveillance video system - Content extraction from noisy media, such as telephone conversations, home videos (as seen on youtube) - Correlating multimedia data to other data sources, especially text sources.
Rather than providing a long list I would like to set our attention to a few specific issues among the many challenges in this area: 1) what type of MM info to extract in relation to its possible use; and how to present it given the available presentation resources (e.g. How to attract the user's attention and interest? There are different cases: for example, a) same info to be presented to a group; b) info segmented so that different people can work on the material collaboratively at the same time; c) info to be intelligently fragmented for individuals in a group, for specific strategic goals, e.g. Affective communication and persuasive communication have only recently become serious object of study. But MM material mostly was produced by humans with a purpose.
In the following, I focus on one particular challenge: the integration of affective computing and multimedia information extraction. What are the critical technical challenges in multimedia information extraction (MMIE)? Work by Picard and others has created considerable awareness for the role of affect in human computer interaction. As key ingredients of affective computing Picard identifies recognizing, expressing, modelling, communicating, and responding to emotional information (Picard 2003). In the context of information extraction, methods of affective computing can be applied to enhance classical information extraction tasks by emotion and sentiment detection.
There's an awful lot of text data available today, and enormous amounts of it are being created on a daily basis, ranging from structured to semi-structured to fully unstructured. What can we do with it? Well, quite a bit, actually; it depends on what your objectives are, but there are 2 intricately related yet differentiated umbrellas of tasks which can be exploited in order to leverage the availability of all of this data. NLP is a major aspect of computational linguistics, and also falls within the realms of computer science and artificial intelligence. Text mining exists in a similar realm as NLP, in that it is concerned with identifying interesting, non-trivial patterns in textual data.