Development of multimodal applications is an iterative, complex, and often a rather heuristic process. This is because in multimodal systems the number of interplaying components can be far greater than in an unimodal Spoken Dialogue System. From the developer's perspective, a multimodal system presents challenges and technical difficulties on many levels. In this paper we will describe our approach to one specific component of multimodal systems, the Multimodal Integrator. On the other hand, from the designer's perspective, all components must be fine-tuned to a level that their combined overall performance can deliver the desired experience to end users. In both cases, evaluation and analysis of the current implementation is paramount. Hence, looking into the details while getting a good understanding of the overall performance of a multimodal system is the other key topic.
While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.
During the last decade, it has been widely shown how modal logics provide suitable tools for various theoretical formalizations in computer science. In fact, many modal systems can be found in the literature, and there are a number of areas where such logics are used. Most popular readings of the modal formula a are, for example, "0 is necessarily frue" (standard modal logic), "a will always be true" (temporal logic), "X knows fhaf a" or "X believes that a" (epistemic logic), or "after executing some program a, a will be frue" (dynamic logic), etc. In general, only one fype of modality is considered, i.e.