Goto

Collaborating Authors

 Bangalore, Srinivas


E2E Spoken Entity Extraction for Virtual Agents

arXiv.org Artificial Intelligence

In human-computer conversations, extracting entities such as names, street addresses and email addresses from speech is a challenging task. In this paper, we study the impact of fine-tuning pre-trained speech encoders on extracting spoken entities in human-readable form directly from speech without the need for text transcription. We illustrate that such a direct approach optimizes the encoder to transcribe only the entity relevant portions of speech ignoring the superfluous portions such as carrier phrases, or spell name entities. In the context of dialog from an enterprise virtual agent, we demonstrate that the 1-step approach outperforms the typical 2-step approach which first generates lexical transcriptions followed by text-based entity extraction for identifying spoken entities.


Trustera: A Live Conversation Redaction System

arXiv.org Artificial Intelligence

Trustera, the first functional system that redacts personally identifiable information (PII) in real-time spoken conversations to remove agents' need to hear sensitive information while preserving the naturalness of live customer-agent conversations. As opposed to post-call redaction, audio masking starts as soon as the customer begins speaking to a PII entity. This significantly reduces the risk of PII being intercepted or stored in insecure data storage. Trustera's architecture consists of a pipeline of automatic speech recognition, natural language understanding, and a live audio redactor module. The system's goal is three-fold: redact entities that are PII, mask the audio that goes to the agent, and at the same time capture the entity, so that the captured PII can be used for a payment transaction or caller identification. Trustera is currently being used by thousands of agents to secure customers' sensitive information.


Evaluation of Semantic Dependency Labeling Across Domains

AAAI Conferences

One of the key concerns in computational semantics is to construct a domain independent semantic representation which captures the richness of natural language, yet can be quickly customized to a specific domain for practical applications. We propose to use generic semantic frames defined in FrameNet, a domain-independent semantic resource, as an intermediate semantic representation for language understanding in dialog systems. In this paper we: (a) outline a novel method for FrameNet-style semantic dependency labeling that builds on a syntactic dependency parse; and (b) compare the accuracy of domain-adapted and generic approaches to semantic parsing for dialog tasks, using a frame-annotated corpus of human-computer dialogs in an airline reservation domain.


The Workshops at the Twentieth National Conference on Artificial Intelligence

AI Magazine

The AAAI-05 workshops were held on Saturday and Sunday, July 9-10, in Pittsburgh, Pennsylvania. The thirteen workshops were Contexts and Ontologies: Theory, Practice and Applications, Educational Data Mining, Exploring Planning and Scheduling for Web Services, Grid and Autonomic Computing, Human Comprehensible Machine Learning, Inference for Textual Question Answering, Integrating Planning into Scheduling, Learning in Computer Vision, Link Analysis, Mobile Robot Workshop, Modular Construction of Humanlike Intelligence, Multiagent Learning, Question Answering in Restricted Domains, and Spoken Language Understanding.


The Workshops at the Twentieth National Conference on Artificial Intelligence

AI Magazine

The AAAI-05 workshops were held on Saturday and Sunday, July 9-10, in Pittsburgh, Pennsylvania. The thirteen workshops were Contexts and Ontologies: Theory, Practice and Applications, Educational Data Mining, Exploring Planning and Scheduling for Web Services, Grid and Autonomic Computing, Human Comprehensible Machine Learning, Inference for Textual Question Answering, Integrating Planning into Scheduling, Learning in Computer Vision, Link Analysis, Mobile Robot Workshop, Modular Construction of Humanlike Intelligence, Multiagent Learning, Question Answering in Restricted Domains, and Spoken Language Understanding.