Pressel, Daniel
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
Shi, Hangjie, Ball, Leslie, Thattai, Govind, Zhang, Desheng, Hu, Lucy, Gao, Qiaozi, Shakiah, Suhaila, Gao, Xiaofeng, Padmakumar, Aishwarya, Yang, Bofei, Chung, Cadence, Guthy, Dinakar, Sukhatme, Gaurav, Arumugam, Karthika, Wen, Matthew, Ipek, Osman, Lange, Patrick, Khanna, Rohan, Pansare, Shreyas, Sharma, Vasu, Zhang, Chao, Flagg, Cris, Pressel, Daniel, Vaz, Lavina, Dai, Luke, Goyal, Prasoon, Sahai, Sattvik, Liu, Shaohua, Lu, Yao, Gottardi, Anna, Hu, Shui, Liu, Yang, Hakkani-Tur, Dilek, Bland, Kate, Rocker, Heather, Jeun, James, Rao, Yadunandana, Johnston, Michael, Iyengar, Akshaya, Mandal, Arindam, Natarajan, Prem, Ghanadan, Reza
The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented with computer vision and physical embodiment. This paper describes the SimBot Challenge, a new challenge in which university teams compete to build robot assistants that complete tasks in a simulated physical environment. This paper provides an overview of the SimBot Challenge, which included both online and offline challenge phases. We describe the infrastructure and support provided to the teams including Alexa Arena, the simulated environment, and the ML toolkit provided to teams to accelerate their building of vision and language models. We summarize the approaches the participating teams took to overcome research challenges and extract key lessons learned. Finally, we provide analysis of the performance of the competing SimBots during the competition.
An Effective Label Noise Model for DNN Text Classification
Jindal, Ishan, Pressel, Daniel, Lester, Brian, Nokleby, Matthew
Because large, human-annotated datasets suffer from labeling errors, it is crucial to be able to train deep neural networks in the presence of label noise. While training image classification models with label noise have received much attention, training text classification models have not. In this paper, we propose an approach to training deep networks that is robust to label noise. This approach introduces a non-linear processing layer (noise model) that models the statistics of the label noise into a convolutional neural network (CNN) architecture. The noise model and the CNN weights are learned jointly from noisy training data, which prevents the model from overfitting to erroneous labels. Through extensive experiments on several text classification datasets, we show that this approach enables the CNN to learn better sentence representations and is robust even to extreme label noise. We find that proper initialization and regularization of this noise model is critical. Further, by contrast to results focusing on large batch sizes for mitigating label noise for image classification, we find that altering the batch size does not have much effect on classification performance.