AITopics | Gupta, Arshit

Collaborating Authors

Gupta, Arshit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models

Nguyen, Pha, Sengupta, Sailik, Malik, Girik, Gupta, Arshit, Min, Bonan

arXiv.org Artificial IntelligenceJan-21-2025

The improved competence of generative models can help building multi-modal virtual assistants that leverage modalities beyond language. By observing humans performing multi-step tasks, one can build assistants that have situational awareness of actions and tasks being performed, enabling them to cater assistance based on this understanding. In this paper, we develop a Context-aware Instructional Task Assistant with Multi-modal Large Language Models (InsTALL) that leverages an online visual stream (e.g. a user's screen share or video recording) and responds in real-time to user queries related to the task at hand. To enable useful assistance, InsTALL 1) trains a multi-modal model on task videos and paired textual data, and 2) automatically extracts task graph from video data and leverages it at training and inference time. We show InsTALL achieves state-of-the-art performance across proposed sub-tasks considered for multimodal activity understanding -- task recognition (TR), action recognition (AR), next action prediction (AP), and plan prediction (PP) -- and outperforms existing baselines on two novel sub-tasks related to automatic error identification.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.12231

Country:

North America > United States > Colorado (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Technology (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets

Aboutalebi, Hossein, Song, Hwanjun, Xie, Yusheng, Gupta, Arshit, Sun, Justin, Su, Hang, Shalyminov, Igor, Pappas, Nikolaos, Singh, Siffi, Mansour, Saab

arXiv.org Artificial IntelligenceMar-5-2024

Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs. Previous approaches augment textual dialogues with retrieved images, posing privacy, diversity, and quality constraints. In this work, we introduce \textbf{M}ultimodal \textbf{A}ugmented \textbf{G}enerative \textbf{I}mages \textbf{D}ialogues (MAGID), a framework to augment text-only dialogues with diverse and high-quality images. Subsequently, a diffusion model is applied to craft corresponding images, ensuring alignment with the identified text. Finally, MAGID incorporates an innovative feedback loop between an image description generation module (textual LLM) and image quality modules (addressing aesthetics, image-text matching, and safety), that work in tandem to generate high-quality and multi-modal dialogues. We compare MAGID to other SOTA baselines on three dialogue datasets, using automated and human evaluation. Our results show that MAGID is comparable to or better than baselines, with significant improvements in human evaluation, especially against retrieval baselines where the image database is small.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.03194

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DeAL: Decoding-time Alignment for Large Language Models

Huang, James Y., Sengupta, Sailik, Bonadiman, Daniele, Lai, Yi-an, Gupta, Arshit, Pappas, Nikolaos, Mansour, Saab, Kirchoff, Katrin, Roth, Dan

arXiv.org Artificial IntelligenceFeb-5-2024

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.06147

Country:

Europe (0.92)
North America > United States > California (0.28)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Government (0.93)
Law (0.93)
Law Enforcement & Public Safety > Terrorism (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue

Davidson, Sam, Romeo, Salvatore, Shu, Raphael, Gung, James, Gupta, Arshit, Mansour, Saab, Zhang, Yi

arXiv.org Artificial IntelligenceSep-22-2023

One of the major impediments to the development of new task-oriented dialogue (TOD) systems is the need for human evaluation at multiple stages and iterations of the development process. In an effort to move toward automated evaluation of TOD, we propose a novel user simulator built using recently developed large pretrained language models (LLMs). In order to increase the linguistic diversity of our system relative to the related previous work, we do not fine-tune the LLMs used by our system on existing TOD datasets; rather we use in-context learning to prompt the LLMs to generate robust and linguistically diverse output with the goal of simulating the behavior of human interlocutors. Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems. Using this approach, our current simulator is effectively able to interact with several TOD systems, especially on single-intent conversational goals, while generating lexically and syntactically diverse output relative to previous simulators that rely upon fine-tuned models. Finally, we collect a Human2Bot dataset of humans interacting with the same TOD systems with which we experimented in order to better quantify these achievements.

artificial intelligence, large language model, task-oriented dialogue, (2 more...)

arXiv.org Artificial Intelligence

2309.13233

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

NatCS: Eliciting Natural Customer Support Dialogues

Gung, James, Moeng, Emily, Rose, Wesley, Gupta, Arshit, Zhang, Yi, Mansour, Saab

arXiv.org Artificial IntelligenceMay-4-2023

Despite growing interest in applications based on natural customer support conversations, there exist remarkably few publicly available datasets that reflect the expected characteristics of conversations in these settings. Existing task-oriented dialogue datasets, which were collected to benchmark dialogue systems mainly in written human-to-bot settings, are not representative of real customer support conversations and do not provide realistic benchmarks for systems that are applied to natural data. To address this gap, we introduce NatCS, a multi-domain collection of spoken customer service conversations. We describe our process for collecting synthetic conversations between customers and agents based on natural language phenomena observed in real conversations. Compared to previous dialogue datasets, the conversations collected with our approach are more representative of real human-to-human conversations along multiple metrics. Finally, we demonstrate potential uses of NatCS, including dialogue act classification and intent induction from conversations as potential applications, showing that dialogue act annotations in NatCS provide more effective training data for modeling real conversations compared to existing synthetic written datasets. We publicly release NatCS to facilitate research in natural dialog systems

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.03007

Country: Europe (0.67)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Banking & Finance > Insurance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Intent Induction from Conversations for Task-Oriented Dialogue Track at DSTC 11

Gung, James, Shu, Raphael, Moeng, Emily, Rose, Wesley, Romeo, Salvatore, Benajiba, Yassine, Gupta, Arshit, Mansour, Saab, Zhang, Yi

arXiv.org Artificial IntelligenceApr-25-2023

With increasing demand for and adoption of virtual assistants, recent work has investigated ways to accelerate bot schema design through the automatic induction of intents or the induction of slots and dialogue states. However, a lack of dedicated benchmarks and standardized evaluation has made progress difficult to track and comparisons between systems difficult to make. This challenge track, held as part of the Eleventh Dialog Systems Technology Challenge, introduces a benchmark that aims to evaluate methods for the automatic induction of customer intents in a realistic setting of customer service interactions between human agents and customers. We propose two subtasks for progressively tackling the automatic induction of intents and corresponding evaluation methodologies. We then present three datasets suitable for evaluating the tasks and propose simple baselines. Finally, we summarize the submissions and results of the challenge track, for which we received submissions from 34 teams.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.12982

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Shu, Raphael, Mansimov, Elman, Alkhouli, Tamer, Pappas, Nikolaos, Romeo, Salvatore, Gupta, Arshit, Mansour, Saab, Zhang, Yi, Roth, Dan

arXiv.org Artificial IntelligenceDec-19-2022

Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience. The conversational model interacts with the environment by generating and executing programs triggering a set of pre-defined APIs. The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses. By allowing generating free-form programs, Dialog2API supports composite goals by combining different APIs, whereas unrestricted program revision provides natural and robust dialogue experience. To facilitate Dialog2API, the core model is provided with API documents, an execution environment and optionally some example dialogues annotated with programs. We propose an approach tailored for the Dialog2API, where the dialogue states are represented by a stack of programs, with most recently mentioned program on the top of the stack. Dialog2API can work with many application scenarios such as software automation and customer service. In this paper, we construct a dataset for AWS S3 APIs and present evaluation results of in-context learning baselines.

artificial intelligence, dialogue, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.09946

Country:

Europe (0.46)
North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.56)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Goal-Embedded Dual Hierarchical Model for Task-Oriented Dialogue Generation

Lai, Yi-An, Gupta, Arshit, Zhang, Yi

arXiv.org Artificial IntelligenceSep-19-2019

Hierarchical neural networks are often used to model inherent structures within dialogues. For goal-oriented dialogues, these models miss a mechanism adhering to the goals and neglect the distinct conversational patterns between two interlocutors. In this work, we propose Goal-Embedded Dual Hierarchical Attentional Encoder-Decoder (G-DuHA) able to center around goals and capture interlocutor-level disparity while modeling goal-oriented dialogues. Experiments on dialogue generation, response generation, and human evaluations demonstrate that the proposed model successfully generates higher-quality, more diverse and goal-centric dialogues. Moreover, we apply data augmentation via goal-oriented dialogue generation for task-oriented dialog systems with better performance achieved.

deep learning, dialogue, neural network, (22 more...)

arXiv.org Artificial Intelligence

1909.0922

Genre: Research Report (0.50)

Industry: Consumer Products & Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback