khudanpur
Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification
Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi
Even though deep speaker models have demonstrated impressive accuracy in speaker verification tasks, this often comes at the expense of increased model size and computation time, presenting challenges for deployment in resource-constrained environments. Our research focuses on addressing this limitation through the development of small footprint deep speaker embedding extraction using knowledge distillation. While previous work in this domain has concentrated on speaker embedding extraction at the utterance level, our approach involves amalgamating embeddings from different levels of the x-vector model (teacher network) to train a compact student network. The results highlight the significance of frame-level information, with the student models exhibiting a remarkable size reduction of 85%-91% compared to their teacher counterparts, depending on the size of the teacher embeddings. Notably, by concatenating teacher embeddings, we achieve student networks that maintain comparable performance to the teacher while enjoying a substantial 75% reduction in model size. These findings and insights extend to other x-vector variants, underscoring the broad applicability of our approach.
Chatbots: A long and complicated history
In the 1960s, an unprecedented computer program called Eliza attempted to simulate the experience of speaking to a therapist. In one exchange, captured in a research paper at the time, a person revealed that her boyfriend had described her as "depressed much of the time." Eliza's response: "I am sorry to hear you are depressed." Eliza, which is widely characterized as the first chatbot, wasn't as versatile as similar services today. The program, which relied on natural language understanding, reacted to key words and then essentially punted the dialogue back to the user.
Johns Hopkins and Amazon collaborate to explore transformative power of AI
Johns Hopkins University and Amazon are teaming up to harness the power of artificial intelligence to transform the way humans interact online and with the world. The new JHU Amazon Initiative for Interactive AI, housed in the Johns Hopkins Whiting School of Engineering, will leverage the university's world-class expertise in interactive AI to advance groundbreaking technologies in machine learning, computer vision, natural language understanding, and speech processing; democratize access to the benefits of AI innovations; and broaden participation in research from diverse, interdisciplinary scholars and other innovators. Amazon's investment will span five years, comprising doctoral fellowships, sponsored research funding, gift funding, and community projects. Sanjeev Khudanpur, an associate professor of electrical and computer engineering at the Whiting School, will serve as the initiative's founding director. Khudanpur is an expert in the application of information-theoretic methods to human language technologies such as automatic speech recognition, machine translation, and natural language processing.