However, a new algorithm from researchers at Stanford and Adobe has shown it's pretty damn good at video dialogue editing, something that requires artistry, skill and considerable time. For instance, many scenes start with a wide "establishing" shot so that the viewer knows where they are. You can also use leisurely or fast pacing, emphasize a certain character, intensify emotions or keep shot types (like wide or closeup) consistent. In an example shown (below), the team selected "start wide" to establish the scene, "avoid jump cuts" for a cinematic (non-YouTube) style, "emphasize character" ("Stacey") and use a faster-paced performance.
As a generative model for building end-to-end dialogue systems, Hierarchical Recurrent Encoder-Decoder (HRED) consists of three layers of Gated Recurrent Unit (GRU), which from bottom to top are separately used as the word-level encoder, the sentence-level encoder, and the decoder. Despite performing well on dialogue corpora, HRED is computationally expensive to train due to its complexity. To improve the training efficiency of HRED, we propose a new model, which is named as Simplified HRED (SHRED), by making each layer of HRED except the top one simpler than its upper layer. On the one hand, we propose Scalar Gated Unit (SGU), which is a simplified variant of GRU, and use it as the sentence-level encoder. On the other hand, we use Fixed-size Ordinally-Forgetting Encoding (FOFE), which has no trainable parameter at all, as the word-level encoder. The experimental results show that compared with HRED under the same word embedding size and the same hidden state size for each layer, SHRED reduces the number of trainable parameters by 25\%--35\%, and the training time by more than 50\%, but still achieves slightly better performance.
Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in building task-oriented dialogue systems from human-human conversations, which may be available in ample amounts in existing customer care center logs or can be collected from crowd workers. Annotating these datasets can be prohibitively expensive. Recently multiple annotated task-oriented human-machine dialogue datasets have been released, however their annotation schema varies across different collections, even for well-defined categories such as dialogue acts (DAs). We propose a Universal DA schema for task-oriented dialogues and align existing annotated datasets with our schema. Our aim is to train a Universal DA tagger (U-DAT) for task-oriented dialogues and use it for tagging human-human conversations. We investigate multiple datasets, propose manual and automated approaches for aligning the different schema, and present results on a target corpus of human-human dialogues. In unsupervised learning experiments we achieve an F1 score of 54.1% on system turns in human-human dialogues. In a semi-supervised setup, the F1 score increases to 57.7% which would otherwise require at least 1.7K manually annotated turns. For new domains, we show further improvements when unlabeled or labeled target domain data is available.
Dialogue in commercial games is largely created by teams of writers and designers who hand-author every line of dialogue and hand-specify the dialogue structure using finite state machines or branching trees. For dialogue heavy games, such as role playing games with significant NPC interactions, or emerging genres such as interactive drama, such hand specification significantly limits the player's interaction possibilities. Decades of research on the standard pipeline architecture in natural language generation has focused on how to generate text given a specification of the communicative goals; one can imagine beginning to adapt such methods for generating the lines of dialogue for characters. But little work has been done on the problem of procedurally generating dialogue structures, that is, dynamically generating dialogue FSMs or trees (more generally, discourse managers) that accomplish communicative goals. In this paper we describe a system that uses a formalization of backstory, character information, and social interactions to dynamically generate interactive dialogue structures that accomplish desired dialogue goals.
The strategies for interactive characters to select appropriate dialogues remain as an open issue in related research areas. In this paper we propose an approach based on reinforcement learning to learn the strategy of interrogation dialogue from one virtual agent toward another. The emotion variation of the suspect agent is modeled with a hazard function, and the detective agent must learn its interrogation strategies based on the emotion state of the suspect agent. The reinforcement learning reward schemes are evaluated to choose the proper reward in the dialogue. Our contribution is twofold. Firstly, we proposed a new framework of reinforcement learning to model dialogue strategies. Secondly, background knowledge and emotion states of agents are brought into the dialogue strategies. The resulted dialogue strategy in our experiment is sensitive in detecting lies from the suspect, and with it the interrogator may receive more correct answer.