AITopics | Mandal, Arindam

Collaborating Authors

Mandal, Arindam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

Shi, Hangjie, Ball, Leslie, Thattai, Govind, Zhang, Desheng, Hu, Lucy, Gao, Qiaozi, Shakiah, Suhaila, Gao, Xiaofeng, Padmakumar, Aishwarya, Yang, Bofei, Chung, Cadence, Guthy, Dinakar, Sukhatme, Gaurav, Arumugam, Karthika, Wen, Matthew, Ipek, Osman, Lange, Patrick, Khanna, Rohan, Pansare, Shreyas, Sharma, Vasu, Zhang, Chao, Flagg, Cris, Pressel, Daniel, Vaz, Lavina, Dai, Luke, Goyal, Prasoon, Sahai, Sattvik, Liu, Shaohua, Lu, Yao, Gottardi, Anna, Hu, Shui, Liu, Yang, Hakkani-Tur, Dilek, Bland, Kate, Rocker, Heather, Jeun, James, Rao, Yadunandana, Johnston, Michael, Iyengar, Akshaya, Mandal, Arindam, Natarajan, Prem, Ghanadan, Reza

arXiv.org Artificial IntelligenceAug-9-2023

The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented with computer vision and physical embodiment. This paper describes the SimBot Challenge, a new challenge in which university teams compete to build robot assistants that complete tasks in a simulated physical environment. This paper provides an overview of the SimBot Challenge, which included both online and offline challenge phases. We describe the infrastructure and support provided to the teams including Alexa Arena, the simulated environment, and the ML toolkit provided to teams to accelerate their building of vision and language models. We summarize the approaches the participating teams took to overcome research challenges and extract key lessons learned. Finally, we provide analysis of the performance of the competing SimBots during the competition.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.05221

Genre: Overview (0.68)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(2 more...)

Add feedback

Alexa Arena: A User-Centric Interactive Platform for Embodied AI

Gao, Qiaozi, Thattai, Govind, Shakiah, Suhaila, Gao, Xiaofeng, Pansare, Shreyas, Sharma, Vasu, Sukhatme, Gaurav, Shi, Hangjie, Yang, Bofei, Zheng, Desheng, Hu, Lucy, Arumugam, Karthika, Hu, Shui, Wen, Matthew, Guthy, Dinakar, Chung, Cadence, Khanna, Rohan, Ipek, Osman, Ball, Leslie, Bland, Kate, Rocker, Heather, Rao, Yadunandana, Johnston, Michael, Ghanadan, Reza, Mandal, Arindam, Tur, Dilek Hakkani, Natarajan, Prem

arXiv.org Artificial IntelligenceJun-7-2023

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for high-efficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena publicly available to facilitate research in building generalizable and assistive embodied agents.

artificial intelligence, natural language, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2303.01586

Genre: Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
(3 more...)

Add feedback

Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

Khatri, Chandra, Hedayatnia, Behnam, Venkatesh, Anu, Nunn, Jeff, Pan, Yi, Liu, Qing, Song, Han, Gottardi, Anna, Kwatra, Sanjeev, Pancholi, Sanju, Cheng, Ming, Chen, Qinglang, Stubel, Lauren, Gopalakrishnan, Karthik, Bland, Kate, Gabriel, Raefer, Mandal, Arindam, Hakkani-Tur, Dilek, Hwang, Gene, Michel, Nate, King, Eric, Prasad, Rohit

arXiv.org Artificial IntelligenceDec-27-2018

Building open domain conversational systems that allow users to have engaging conversations on topics of their choice is a challenging task. Alexa Prize was launched in 2016 to tackle the problem of achieving natural, sustained, coherent and engaging open-domain dialogs. In the second iteration of the competition in 2018, university teams advanced the state of the art by using context in dialog models, leveraging knowledge graphs for language understanding, handling complex utterances, building statistical and hierarchical dialog managers, and leveraging model-driven signals from user responses. The 2018 competition also included the provision of a suite of tools and models to the competitors including the CoBot (conversational bot) toolkit, topic and dialog act detection models, conversation evaluators, and a sensitive content detection model so that the competing teams could focus on building knowledge-rich, coherent and engaging multi-turn dialog systems. This paper outlines the advances developed by the university teams as well as the Alexa Prize team to achieve the common goal of advancing the science of Conversational AI. We address several key open-ended problems such as conversational speech recognition, open domain natural language understanding, commonsense reasoning, statistical dialog management and dialog evaluation. These collaborative efforts have driven improved experiences by Alexa users to an average rating of 3.61, median duration of 2 mins 18 seconds, and average turns to 14.6, increases of 14%, 92%, 54% respectively since the launch of the 2018 competition. For conversational speech recognition, we have improved our relative Word Error Rate by 55% and our relative Entity Error Rate by 34% since the launch of the Alexa Prize. Socialbots improved in quality significantly more rapidly in 2018, in part due to the release of the CoBot toolkit, with new entrants attaining an average rating of 3.35 just 1 week into the semifinals, compared to 9 weeks in the 2017 competition.

deep learning, socialbot, speech recognition, (21 more...)

arXiv.org Artificial Intelligence

1812.10757

Country: Asia > Middle East > Israel (0.14)

Genre:

Contests & Prizes (1.00)
Research Report > Promising Solution (0.46)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Flexible and Scalable State Tracking Framework for Goal-Oriented Dialogue Systems

Goel, Rahul, Paul, Shachi, Chung, Tagyoung, Lecomte, Jeremie, Mandal, Arindam, Hakkani-Tur, Dilek

arXiv.org Artificial IntelligenceNov-30-2018

Goal-oriented dialogue systems typically rely on components specifically developed for a single task or domain. This limits such systems in two different ways: If there is an update in the task domain, the dialogue system usually needs to be updated or completely re-trained. It is also harder to extend such dialogue systems to different and multiple domains. The dialogue state tracker in conventional dialogue systems is one such component - it is usually designed to fit a well-defined application domain. For example, it is common for a state variable to be a categorical distribution over a manually-predefined set of entities (Henderson et al., 2013), resulting in an inflexible and hard-to-extend dialogue system. In this paper, we propose a new approach for dialogue state tracking that can generalize well over multiple domains without incorporating any domain-specific knowledge. Under this framework, discrete dialogue state variables are learned independently and the information of a predefined set of possible values for dialogue state variables is not required. Furthermore, it enables adding arbitrary dialogue context as features and allows for multiple values to be associated with a single state variable. These characteristics make it much easier to expand the dialogue state space. We evaluate our framework using the widely used dialogue state tracking challenge data set (DSTC2) and show that our framework yields competitive results with other state-of-the-art results despite incorporating little domain knowledge. We also show that this framework can benefit from widely available external resources such as pre-trained word embeddings.

deep learning, dialogue system, neural network, (19 more...)

arXiv.org Artificial Intelligence

1811.12891

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Restaurants (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Parsing Coordination for Spoken Language Understanding

Agarwal, Sanchit, Goel, Rahul, Chung, Tagyoung, Sethi, Abhishek, Mandal, Arindam, Matsoukas, Spyros

arXiv.org Machine LearningOct-26-2018

ABSTRACT Typical spoken language understanding systems provide narrow semantic parses using a domain-specific ontology. The parses contain intents and slots that are directly consumed by downstream domain applications. In this work we discuss expanding such systems to handle compound entities and intents by introducing a domain-agnostic shallow parser that handles linguistic coordination. We show that our model for parsing coordination learns domain-independent and slot-independent features and is able to segment conjunct boundaries of many different phrasal categories. We also show that using adversarial training can be effective for improving generalization across different slot types for coordination parsing. Index Terms-- spoken language understanding, chunking, coordination 1. INTRODUCTION A typical spoken language understanding (SLU) system maps user utterances to domain-specific semantic representations that can be factored into an intent and slots [1, 2]. For example, an utterance, "what is the weather like in boston" has one intent WeatherInfo and one slot type CityName whose value is "boston." Thus, parsing for such systems is often factored into two separate tasks: intent classification and entity recognition whose results are consumed by downstream domain applications.

deep learning, speech recognition, utterance, (21 more...)

arXiv.org Machine Learning

1810.11497

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Augmentation for Robust Keyword Spotting under Playback Interference

Raju, Anirudh, Panchapagesan, Sankaran, Liu, Xing, Mandal, Arindam, Strom, Nikko

arXiv.org Machine LearningAug-1-2018

Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the device (b) imperfect cancellation of the audio playback from the device, resulting in residual echo, after being processed by the Acoustic Echo Cancellation (AEC) system. In this paper, we propose a data augmentation strategy to improve keyword spotting performance under these challenging conditions. The training set audio is artificially corrupted by mixing in music and TV/movie audio, at different signal to interference ratios. Our results show that we get around 30-45% relative reduction in false reject rates, at a range of false alarm rates, under audio playback from such devices.

artificial intelligence, keyword, speech recognition, (17 more...)

arXiv.org Machine Learning

1808.00563

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.49)
Media > Television (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback

Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

Sun, Ming, Raju, Anirudh, Tucker, George, Panchapagesan, Sankaran, Fu, Gengshen, Mandal, Arindam, Matsoukas, Spyros, Strom, Nikko, Vitaladevuni, Shiv

arXiv.org Machine LearningMay-5-2017

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67.6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

deep learning, keyword, neural network, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/SLT.2016.7846306

1705.02411

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback