AITopics | Gulwani, Sumit

Collaborating Authors

Gulwani, Sumit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TableTalk: Scaffolding Spreadsheet Development with a Language Agent

Liang, Jenny T., Kumar, Aayush, Bajpai, Yasharth, Gulwani, Sumit, Le, Vu, Parnin, Chris, Radhakrishna, Arjun, Tiwari, Ashish, Murphy-Hill, Emerson, Soares, Guastavo

arXiv.org Artificial IntelligenceFeb-13-2025

Despite its ubiquity in the workforce, spreadsheet programming remains challenging as programmers need both spreadsheet-specific knowledge (e.g., APIs to write formulas) and problem-solving skills to create complex spreadsheets. Large language models (LLMs) can help automate aspects of this process, and recent advances in planning and reasoning have enabled language agents, which dynamically plan, use tools, and take iterative actions to complete complex tasks. These agents observe, plan, and act, making them well-suited to scaffold spreadsheet programming by following expert processes. We present TableTalk, a language agent that helps programmers build spreadsheets conversationally. Its design reifies three design principles -- scaffolding, flexibility, and incrementality -- which we derived from two studies of seven programmers and 62 Excel templates. TableTalk structures spreadsheet development by generating step-by-step plans and suggesting three next steps users can choose from. It also integrates tools that enable incremental spreadsheet construction. A user study with 20 programmers shows that TableTalk produces spreadsheets 2.3 times more likely to be preferred over a baseline agent, while reducing cognitive load and time spent reasoning about spreadsheet actions by 12.6%. TableTalk's approach has implications for human-agent collaboration. This includes providing persistent direct manipulation interfaces for stopping or undoing agent actions, while ensuring that such interfaces for accepting actions can be deactivated.

large language model, natural language, scaffolding spreadsheet development, (5 more...)

arXiv.org Artificial Intelligence

2502.09787

Genre: Research Report (0.40)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

Gupta, Naman, Kirtania, Shashank, Gupta, Priyanshu, Kariya, Krishna, Gulwani, Sumit, Iyer, Arun, Parthasarathy, Suresh, Radhakrishna, Arjun, Rajamani, Sriram K., Soares, Gustavo

arXiv.org Artificial IntelligenceOct-14-2024

Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. Each document is assigned to an actor, modeled as a ReACT agent, which performs structured edits based on document-specific targeted instructions from a centralized critic. Experimental results show that STACKFEED significantly improves KB quality and RAG system performance, enhancing accuracy by up to 8% over baselines.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.10584

Country:

North America > Canada (0.28)
North America > United States (0.28)
Asia > Middle East (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.85)
(4 more...)

Add feedback

An Empirical Study of Validating Synthetic Data for Formula Generation

Singh, Usneek, Cambronero, José, Gulwani, Sumit, Kanade, Aditya, Khatry, Anirudh, Le, Vu, Singh, Mukul, Verbruggen, Gust

arXiv.org Artificial IntelligenceJul-15-2024

Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and limiting the ability to fine-tune them. Given a corpus of formulas, we can use a(nother) model to generate synthetic natural language utterances for fine-tuning. However, it is important to validate whether the NL generated by the LLM is indeed accurate to be beneficial for fine-tuning. In this paper, we provide empirical results on the impact of validating these synthetic training examples with surrogate objectives that evaluate the accuracy of the synthetic annotations. We demonstrate that validation improves performance over raw data across four models (2 open and 2 closed weight). Interestingly, we show that although validation tends to prune more challenging examples, it increases the complexity of problems that models can solve after being fine-tuned on validated data.

formula, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.10657

Country: Asia > India (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

METAREFLECTION: Learning Instructions for Language Agents using Past Reflections

Gupta, Priyanshu, Kirtania, Shashank, Singha, Ananya, Gulwani, Sumit, Radhakrishna, Arjun, Shi, Sherry, Soares, Gustavo

arXiv.org Artificial IntelligenceMay-13-2024

Despite the popularity of Large Language Models (LLMs), crafting specific prompts for LLMs to perform particular tasks remains challenging. Users often engage in multiple conversational turns with an LLM-based agent to accomplish their intended task. Recent studies have demonstrated that linguistic feedback, in the form of self-reflections generated by the model, can work as reinforcement during these conversations, thus enabling quicker convergence to the desired outcome. Motivated by these findings, we introduce METAREFLECTION, a novel technique that learns general prompt instructions for a specific domain from individual self-reflections gathered during a training phase. We evaluate our technique in two domains: Infrastructure as Code (IAC) vulnerability detection and question-answering (QA) using REACT and COT. Our results demonstrate a notable improvement, with METARELECTION outperforming GPT-4 by 16.82% (IAC), 31.33% (COT), and 15.42% (REACT), underscoring the potential of METAREFLECTION as a viable method for enhancing the efficiency of LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.13009

Country:

Africa (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.94)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Enhancing Creativity in Large Language Models through Associative Thinking Strategies

Mehrotra, Pronita, Parab, Aishni, Gulwani, Sumit

arXiv.org Artificial IntelligenceMay-9-2024

This paper explores the enhancement of creativity in Large Language Models (LLMs) like vGPT-4 through associative thinking, a cognitive process where creative ideas emerge from linking seemingly unrelated concepts. Associative thinking strategies have been found to effectively help humans boost creativity. However, whether the same strategies can help LLMs become more creative remains under-explored. In this work, we investigate whether prompting LLMs to connect disparate concepts can augment their creative outputs. Focusing on three domains -- Product Design, Storytelling, and Marketing -- we introduce creativity tasks designed to assess vGPT-4's ability to generate original and useful content. By challenging the models to form novel associations, we evaluate the potential of associative thinking to enhance the creative capabilities of LLMs. Our findings show that leveraging associative thinking techniques can significantly improve the originality of vGPT-4's responses.

creativity, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2405.06715

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.89)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Semantically Aligned Question and Code Generation for Automated Insight Generation

Singha, Ananya, Chopra, Bhavya, Khatry, Anirudh, Gulwani, Sumit, Henley, Austin Z., Le, Vu, Parnin, Chris, Singh, Mukul, Verbruggen, Gust

arXiv.org Artificial IntelligenceMar-21-2024

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.01556

Country:

North America > United States (0.69)
Asia > Middle East > UAE (0.28)

Genre: Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Sports > Snooker (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

Chopra, Bhavya, Bajpai, Yasharth, Biyani, Param, Soares, Gustavo, Radhakrishna, Arjun, Parnin, Chris, Gulwani, Sumit

arXiv.org Artificial IntelligenceFeb-9-2024

The widespread availability of Large Language Models (LLMs) within Integrated Development Environments (IDEs) has led to their speedy adoption. Conversational interactions with LLMs enable programmers to obtain natural language explanations for various software development tasks. However, LLMs often leap to action without sufficient context, giving rise to implicit assumptions and inaccurate responses. Conversations between developers and LLMs are primarily structured as question-answer pairs, where the developer is responsible for asking the the right questions and sustaining conversations across multiple turns. In this paper, we draw inspiration from interaction patterns and conversation analysis -- to design Robin, an enhanced conversational AI-assistant for debugging. Through a within-subjects user study with 12 industry professionals, we find that equipping the LLM to -- (1) leverage the insert expansion interaction pattern, (2) facilitate turn-taking, and (3) utilize debugging workflows -- leads to lowered conversation barriers, effective fault localization, and 5x improvement in bug resolution rates.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.06229

Country: North America > United States (0.50)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generative AI for Education (GAIED): Advances, Opportunities, and Challenges

Denny, Paul, Gulwani, Sumit, Heffernan, Neil T., Käser, Tanja, Moore, Steven, Rafferty, Anna N., Singla, Adish

arXiv.org Artificial IntelligenceFeb-6-2024

This survey article has grown out of the GAIED (pronounced "guide") workshop organized by the authors at the NeurIPS 2023 conference. We organized the GAIED workshop as part of a community-building effort to bring together researchers, educators, and practitioners to explore the potential of generative AI for enhancing education. This article aims to provide an overview of the workshop activities and highlight several future research directions in the area of GAIED.

generative ai, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.0158

Country: North America > United States > Texas (0.14)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.88)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)

Add feedback

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

Phung, Tung, Pădurean, Victor-Alexandru, Singh, Anjali, Brooks, Christopher, Cambronero, José, Gulwani, Sumit, Singla, Adish, Soares, Gustavo

arXiv.org Artificial IntelligenceDec-21-2023

Generative AI and large language models hold great promise in enhancing programming education by automatically generating individualized feedback for students. We investigate the role of generative AI models in providing human tutor-style programming hints to help students resolve errors in their buggy programs. Recent works have benchmarked state-of-the-art models for various feedback generation scenarios; however, their overall quality is still inferior to human tutors and not yet ready for real-world deployment. In this paper, we seek to push the limits of generative AI models toward providing high-quality programming hints and develop a novel technique, GPT4Hints-GPT3.5Val. As a first step, our technique leverages GPT-4 as a ``tutor'' model to generate hints -- it boosts the generative quality by using symbolic information of failing test cases and fixes in prompts. As a next step, our technique leverages GPT-3.5, a weaker model, as a ``student'' model to further validate the hint quality -- it performs an automatic quality validation by simulating the potential utility of providing this feedback. We show the efficacy of our technique via extensive evaluation using three real-world datasets of Python programs covering a variety of concepts ranging from basic algorithms to regular expressions and data analysis using pandas library.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.0378

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.76)

Add feedback

FLAME: A small language model for spreadsheet formulas

Joshi, Harshit, Ebenezer, Abishai, Cambronero, José, Gulwani, Sumit, Kanade, Aditya, Le, Vu, Radiček, Ivan, Verbruggen, Gust

arXiv.org Artificial IntelligenceDec-19-2023

Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitive performance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction and noisy auto-encoding as pre-training objectives. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such as the Davinci (175B) and Cushman (12B) variants of Codex and CodeT5 (220M), in 10 of 14 evaluation settings for the repair and completion tasks. For formula retrieval, FLAME outperforms CodeT5, CodeBERT, and GraphCodeBERT.

formula, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.13779

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback