Beckwith, Richard
ACE, Action and Control via Explanations: A Proposal for LLMs to Provide Human-Centered Explainability for Multimodal AI Assistants
Watkins, Elizabeth Anne, Moss, Emanuel, Manuvinakurike, Ramesh, Shi, Meng, Beckwith, Richard, Raffa, Giuseppe
In this short paper we address issues related to building multimodal AI systems for human performance support in manufacturing domains. We make two contributions: we first identify challenges of participatory design and training of such systems, and secondly, to address such challenges, we propose the ACE paradigm: "Action and Control via Explanations". Specifically, we suggest that LLMs can be used to produce explanations in the form of human interpretable "semantic frames", which in turn enable end users to provide data the AI system needs to align its multimodal models and representations, including computer vision, automatic speech recognition, and document inputs. ACE, by using LLMs to "explain" using semantic frames, will help the human and the AI system to collaborate, together building a more accurate model of humans activities and behaviors, and ultimately more accurate predictive outputs for better task support, and better outcomes for human users performing manual tasks.
QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing
Manuvinakurike, Ramesh, Watkins, Elizabeth, Savur, Celal, Rhodes, Anthony, Biswas, Sovan, Mejia, Gesem Gudino, Beckwith, Richard, Sahay, Saurav, Raffa, Giuseppe, Nachman, Lama
In this work we explore utilizing LLMs for data augmentation for manufacturing task guidance system. The dataset consists of representative samples of interactions with technicians working in an advanced manufacturing setting. The purpose of this work to explore the task, data augmentation for the supported tasks and evaluating the performance of the existing LLMs. We observe that that task is complex requiring understanding from procedure specification documents, actions and objects sequenced temporally. The dataset consists of 200,000+ question/answer pairs that refer to the spec document and are grounded in narrations and/or video demonstrations. We compared the performance of several popular open-sourced LLMs by developing a "baseline" using each LLM and then compared the responses in a reference-free setting using LLM-as-a-judge and compared the ratings with crowd-workers whilst validating the ratings with experts.
Human in the loop approaches in multi-modal conversational task guidance system development
Manuvinakurike, Ramesh, Biswas, Sovan, Raffa, Giuseppe, Beckwith, Richard, Rhodes, Anthony, Shi, Meng, Mejia, Gesem Gudino, Sahay, Saurav, Nachman, Lama
Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervised approaches fail to deliver the expected results in terms of overall performance, user experience and adaptation to realistic conditions. In this preliminary work we first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations. We finally develop a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot experiment.