AITopics | Johnston, Michael

Collaborating Authors

Johnston, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

S-EQA: Tackling Situational Queries in Embodied Question Answering

Dorbala, Vishnu Sashank, Goyal, Prasoon, Piramuthu, Robinson, Johnston, Michael, Manocha, Dinesh, Ghanadhan, Reza

arXiv.org Artificial IntelligenceMay-7-2024

We present and tackle the problem of Embodied Question Answering (EQA) with Situational Queries (S-EQA) in a household environment. Unlike prior EQA work tackling simple queries that directly reference target objects and quantifiable properties pertaining them, EQA with situational queries (such as "Is the bathroom clean and dry?") is more challenging, as the agent needs to figure out not just what the target objects pertaining to the query are, but also requires a consensus on their states to be answerable. Towards this objective, we first introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an LLM's output to create a dataset of unique situational queries, corresponding consensus object information, and predicted answers. PGE maintains uniqueness among the generated queries, using multiple forms of semantic similarity. We validate the generated dataset via a large scale user-study conducted on M-Turk, and introduce it as S-EQA, the first dataset tackling EQA with situational queries. Our user study establishes the authenticity of S-EQA with a high 97.26% of the generated queries being deemed answerable, given the consensus object data. Conversely, we observe a low correlation of 46.2% on the LLM-predicted answers to human-evaluated ones; indicating the LLM's poor capability in directly answering situational queries, while establishing S-EQA's usability in providing a human-validated consensus for an indirect solution. We evaluate S-EQA via Visual Question Answering (VQA) on VirtualHome, which unlike other simulators, contains several objects with modifiable states that also visually appear different upon modification -- enabling us to set a quantitative benchmark for S-EQA. To the best of our knowledge, this is the first work to introduce EQA with situational queries, and also the first to use a generative approach for query creation.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.04732

Country: North America > United States > Maryland (0.14)

Genre:

Questionnaire & Opinion Survey (0.97)
Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

Mullen, James F. Jr, Goyal, Prasoon, Piramuthu, Robinson, Johnston, Michael, Manocha, Dinesh, Ghanadan, Reza

arXiv.org Artificial IntelligenceApr-12-2024

Home robots intend to make their users lives easier. Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home. Some examples of these anomalies include the user leaving their milk out, forgetting to turn off the stove, or leaving poison accessible to children. To move towards enabling home robots with these abilities, we have created a new dataset, which we call SafetyDetect. The SafetyDetect dataset consists of 1000 anomalous home scenes, each of which contains unsafe or unsanitary situations for an agent to detect. Our approach utilizes large language models (LLMs) alongside both a graph representation of the scene and the relationships between the objects in the scene. Our key insight is that this connected scene graph and the object relationships it encodes enables the LLM to better reason about the scene -- especially as it relates to detecting dangerous or unsanitary situations. Our most promising approach utilizes GPT-4 and pursues a categorization technique where object relations from the scene graph are classified as normal, dangerous, unsanitary, or dangerous for children. This method is able to correctly identify over 90% of anomalous scenarios in the SafetyDetect Dataset. Additionally, we conduct real world experiments on a ClearPath TurtleBot where we generate a scene graph from visuals of the real world scene, and run our approach with no modification. This setup resulted in little performance loss. The SafetyDetect Dataset and code will be released to the public upon this papers publication.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.08827

Country: North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Kennington, Casey, Alikhani, Malihe, Pon-Barry, Heather, Atwell, Katherine, Bisk, Yonatan, Fried, Daniel, Gervits, Felix, Han, Zhao, Inan, Mert, Johnston, Michael, Korpan, Raj, Litman, Diane, Marge, Matthew, Matuszek, Cynthia, Mead, Ross, Mohan, Shiwali, Mooney, Raymond, Parde, Natalie, Sinapov, Jivko, Stewart, Angela, Stone, Matthew, Tellex, Stefanie, Williams, Tom

arXiv.org Artificial IntelligenceApr-1-2024

The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.01158

Country:

North America > United States > New York (0.14)
North America > United States > Maryland (0.14)
North America > United States > Illinois (0.14)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report (0.64)

Industry: Education > Curriculum (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Li, Jiachen, Gao, Qiaozi, Johnston, Michael, Gao, Xiaofeng, He, Xuehai, Shakiah, Suhaila, Shi, Hangjie, Ghanadan, Reza, Wang, William Yang

arXiv.org Artificial IntelligenceOct-14-2023

Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning. However, not much attention has been paid to embodied tasks with multimodal prompts, combining vision signals with text descriptions. This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals. In this work, we introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts from multi-task expert trajectories. Our methods consist of a two-stage training pipeline that performs inverse dynamics pretraining and multi-task finetuning. To facilitate multimodal understanding, we design our multimodal prompt encoder by augmenting a pretrained LM with a residual connection to the visual input and model the dependencies among action dimensions. Empirically, we evaluate the efficacy of our method on the VIMA-BENCH (Jiang et al., 2023) and establish a new state-ofthe-art (10% improvement in success rate). Moreover, we demonstrate that our model exhibits remarkable in-context learning ability. By leveraging LLM's remarkable zero-shot generalizability, various research initiatives Ahn et al. (2022); Huang et al. (2022a;b) have developed powerful action planners to parse language instructions into a sequence of sub-goals.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.09676

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI

Shi, Hangjie, Ball, Leslie, Thattai, Govind, Zhang, Desheng, Hu, Lucy, Gao, Qiaozi, Shakiah, Suhaila, Gao, Xiaofeng, Padmakumar, Aishwarya, Yang, Bofei, Chung, Cadence, Guthy, Dinakar, Sukhatme, Gaurav, Arumugam, Karthika, Wen, Matthew, Ipek, Osman, Lange, Patrick, Khanna, Rohan, Pansare, Shreyas, Sharma, Vasu, Zhang, Chao, Flagg, Cris, Pressel, Daniel, Vaz, Lavina, Dai, Luke, Goyal, Prasoon, Sahai, Sattvik, Liu, Shaohua, Lu, Yao, Gottardi, Anna, Hu, Shui, Liu, Yang, Hakkani-Tur, Dilek, Bland, Kate, Rocker, Heather, Jeun, James, Rao, Yadunandana, Johnston, Michael, Iyengar, Akshaya, Mandal, Arindam, Natarajan, Prem, Ghanadan, Reza

arXiv.org Artificial IntelligenceAug-9-2023

The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented with computer vision and physical embodiment. This paper describes the SimBot Challenge, a new challenge in which university teams compete to build robot assistants that complete tasks in a simulated physical environment. This paper provides an overview of the SimBot Challenge, which included both online and offline challenge phases. We describe the infrastructure and support provided to the teams including Alexa Arena, the simulated environment, and the ML toolkit provided to teams to accelerate their building of vision and language models. We summarize the approaches the participating teams took to overcome research challenges and extract key lessons learned. Finally, we provide analysis of the performance of the competing SimBots during the competition.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.05221

Genre: Overview (0.68)

Industry:

Leisure & Entertainment > Games > Computer Games (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(2 more...)

Add feedback

Alexa Arena: A User-Centric Interactive Platform for Embodied AI

Gao, Qiaozi, Thattai, Govind, Shakiah, Suhaila, Gao, Xiaofeng, Pansare, Shreyas, Sharma, Vasu, Sukhatme, Gaurav, Shi, Hangjie, Yang, Bofei, Zheng, Desheng, Hu, Lucy, Arumugam, Karthika, Hu, Shui, Wen, Matthew, Guthy, Dinakar, Chung, Cadence, Khanna, Rohan, Ipek, Osman, Ball, Leslie, Bland, Kate, Rocker, Heather, Rao, Yadunandana, Johnston, Michael, Ghanadan, Reza, Mandal, Arindam, Tur, Dilek Hakkani, Natarajan, Prem

arXiv.org Artificial IntelligenceJun-7-2023

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for high-efficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena publicly available to facilitate research in building generalizable and assistive embodied agents.

artificial intelligence, natural language, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2303.01586

Genre: Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
(3 more...)

Add feedback

Improving Open-Domain Dialogue Evaluation with a Causal Inference Model

Le, Cat P., Dai, Luke, Johnston, Michael, Liu, Yang, Walker, Marilyn, Ghanadan, Reza

arXiv.org Artificial IntelligenceJan-30-2023

Effective evaluation methods remain a significant challenge for research on open-domain conversational dialogue systems. Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked, and those they give can be highly subjective. Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect. Here, we explore the creation of automated methods for predicting both expert and user ratings of open-domain dialogues. We compare four different approaches. First, we train a baseline model using an end-to-end transformer to predict ratings directly from the raw dialogue text. The other three methods are variants of a two-stage approach in which we first extract interpretable features at the turn level that capture, among other aspects, user dialogue behaviors indicating contradiction, repetition, disinterest, compliments, or criticism. We project these features to the dialogue level and train a dialogue-level MLP regression model, a dialogue-level LSTM, and a novel causal inference model called counterfactual-LSTM (CF-LSTM) to predict ratings. The proposed CF-LSTM is a sequential model over turn-level features which predicts ratings using multiple regressors depending on hypotheses derived from the turn-level features. As a causal inference model, CF-LSTM aims to learn the underlying causes of a specific event, such as a low rating. We also bin the user ratings and perform classification experiments with all four models. In evaluation experiments on conversational data from the Alexa Prize SocialBot, we show that the CF-LSTM achieves the best performance for predicting dialogue ratings and classification.

artificial intelligence, dialogue, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2301.13372

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Yin, Da, Gao, Feng, Thattai, Govind, Johnston, Michael, Chang, Kai-Wei

arXiv.org Artificial IntelligenceJan-4-2023

A key goal for the advancement of AI is to develop technologies that serve the needs not just of one group but of all communities regardless of their geographical region. In fact, a significant proportion of knowledge is locally shared by people from certain regions but may not apply equally in other regions because of cultural differences. If a model is unaware of regional characteristics, it may lead to performance disparity across regions and result in bias against underrepresented groups. We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories. Motivated by the attributes, we design new pre-training objectives Image Knowledge Matching (IKM) and Image Edit Checking (IEC) to pre-train GIVL. Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V&L tasks.

artificial intelligence, geographical inclusivity, vision-language model, (2 more...)

arXiv.org Artificial Intelligence

2301.01893

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.60)

Add feedback

Knowledge-Graph Driven Information State Approach to Dialog

Stoyanchev, Svetlana (Interactions Corporation) | Johnston, Michael (Interactions Corporation)

AAAI ConferencesApr-6-2018

A modular conversational dialog system, in contrast to end-to-end, includes natural language understanding, dialog management, and natural language generation components. A dialog system framework simplifies development and maintenance of modular dialog systems. We propose a knowledge graph driven framework (KGD) based on the Information State Update (ISU) approach and adapted for practical task oriented applications. With the proposed framework, a system is defined declaratively by describing the information structure of a domain. We demonstrate the effectiveness of the approach in enabling rich conversational dialog in food ordering domain.

dialog, knowledge-graph driven information state approach

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.53)

Add feedback