Conversational interfaces where called a break-through technology by MIT Technology Review. Another article went on to say... "We think the next era will belong to "the conversational layer" -- both text- and voice-driven -- that will use chat, messaging, or natural language interfaces to interact with people, brands, services, and bots." This conversation layer is powered by conversational or dialog agents. Done right, the agent can greatly improve customer experience and get more tasks done through automation. I grabbed this deck from a Stanford class that provides a good technical overview of NLU and conversational agents.
Many businesses and consumers are extending the capabilities of voice-based services such as Amazon Alexa, Google Home, Microsoft Cortana, and Apple Siri to create custom voice experiences (also known as skills). As the number of these experiences increases, a key problem is the discovery of skills that can be used to address a user's request. In this paper, we focus on conversational skill discovery and present a conversational agent which engages in a dialog with users to help them find the skills that fulfill their needs. To this end, we start with a rule-based agent and improve it by using reinforcement learning. In this way, we enable the agent to adapt to different user attributes and conversational styles as it interacts with users. We evaluate our approach in a real production setting by deploying the agent to interact with real users, and show the effectiveness of the conversational agent in helping users find the skills that serve their request.
Unsupervised word embeddings provide rich linguistic and conceptual information about words. However, they may provide weak information about domain specific semantic relations for certain tasks such as semantic parsing of natural language queries, where such information about words can be valuable. To encode the prior knowledge about the semantic word relations, we present new method as follows: We extend the neural network based lexical word embedding objective function Mikolov, et.al. 2013 by incorporating the information about relationship between entities that we extract from knowledge bases. Our model can jointly learn lexical word representations from free text enriched by the relational word embeddings from relational data (e.g., Freebase) for each type of entity relations. We empirically show on the task of semantic tagging of natural language queries that our enriched embeddings can provide information about not only short-range syntactic dependencies but also long-range semantic dependencies between words. Using the enriched embeddings, we obtain an average of 2% improvement in F-score compared to the previous baselines.
We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research. Link to open-source code: https://github.com/DigitalPhonetics/adviser
Ramanarayanan, Vikram (Educational Testing Service) | Suendermann-Oeft, David (Educational Testing Service) | Molloy, Hillary (Educational Testing Service) | Tsuprun, Eugene (Educational Testing Service) | Lange, Patrick (Educational Testing Service) | Evanini, Keelan (Educational Testing Service)
The advent of multiple study on crowdsourcing for speech applications concluded crowdsourcing vendors and software infrastructure has that "although the crowd sometimes approached the level greatly helped this effort. Several providers also offer integrated of the experts, it never surpassed it" (Parent and Eskenazi filtering tools that allow users to customize different 2011)). This is exacerbated during multimodal dialog data aspects of their data collection, including target population, collections, where it becomes harder to quality-control for geographical location, demographics and sometimes usable audio-video data, due to a variety of factors including even education level and expertise. Managed crowdsourcing poor visual quality caused by variable lighting, position, providers extend these options by offering further customization or occlusions, participant or administrator error, or technical and end-to-end management of the entire data issues with the system or network (McDuff, Kaliouby, and collection operation.