Wang, Shaonan (Institute of Automation, Chinese Academy of Sciences) | Zhang, Jiajun (Institute of Automation, Chinese Academy of Sciences) | Lin, Nan (Institute of Psychology, Chinese Academy of Sciences) | Zong, Chengqing (Institute of Automation, Chinese Academy of Sciences)
Multimodal models have been proven to outperform text-based approaches on learning semantic representations. However, it still remains unclear what properties are encoded in multimodal representations, in what aspects do they outperform the single-modality representations, and what happened in the process of semantic compositionality in different input modalities. Considering that multimodal models are originally motivated by human concept representations, we assume that correlating multimodal representations with brain-based semantics would interpret their inner properties to answer the above questions. To that end, we propose simple interpretation methods based on brain-based componential semantics. First we investigate the inner properties of multimodal representations by correlating them with corresponding brain-based property vectors. Then we map the distributed vector space to the interpretable brain-based componential space to explore the inner properties of semantic compositionality. Ultimately, the present paper sheds light on the fundamental questions of natural language understanding, such as how to represent the meaning of words and how to combine word meanings into larger units.
This paper describes a framework for an agent to learn verb-phrase meanings from human teachers and combine these models with environmental dynamics so the agent can enact verb commands from the human teacher. This style of human/agent interaction allows the human teacher to issue natural-language commands and demonstrate ground actions, thereby alleviating the need for advanced teaching interfaces or difficult goal encodings. The framework extends prior work in apprenticeship learning and builds off of recent advancements in learning to recognize activities and modeling domains with multiple objects. In our studies, we show how to both learn a verb model and turn it into reward and heuristic functions that can then be composed with a dynamics model. The resulting "combined model" can then be efficiently searched by a sample-based planner which determines a policy for enacting a verb command in a given environment. Our experiments with a simulated robot domain show this framework can be used to quickly teach verb commands that the agent can then enact in new environments.
Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.
In this paper, we propose a bidimensional attention based recursiveautoencoder (BattRAE) to integrate clues and sourcetargetinteractions at multiple levels of granularity into bilingualphrase representations. We employ recursive autoencodersto generate tree structures of phrases with embeddingsat different levels of granularity (e.g., words, sub-phrases andphrases). Over these embeddings on the source and targetside, we introduce a bidimensional attention network to learntheir interactions encoded in a bidimensional attention matrix,from which we extract two soft attention weight distributionssimultaneously. These weight distributions enableBattRAE to generate compositive phrase representations viaconvolution. Based on the learned phrase representations, wefurther use a bilinear neural model, trained via a max-marginmethod, to measure bilingual semantic similarity. To evaluatethe effectiveness of BattRAE, we incorporate this semanticsimilarity as an additional feature into a state-of-the-art SMTsystem. Extensive experiments on NIST Chinese-English testsets show that our model achieves a substantial improvementof up to 1.63 BLEU points on average over the baseline.
There are many peculiar English phrases whose origins and meaning can appear obscure. For instance, where does "dead as a doornail" come from? When might one say: "I'll go to the foot of our stairs?" A recent BBC News article unearthing the stories behind some phrases drew a huge response from readers, who sent in examples of their own. But how much do you know about the English language and its sayings?