AITopics | Grossman, Tovi

Collaborating Authors

Grossman, Tovi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ImageInThat: Manipulating Images to Convey User Instructions to Robots

Mahadevan, Karthik, Lewis, Blaine, Li, Jiannan, Mutlu, Bilge, Tang, Anthony, Grossman, Tovi

arXiv.org Artificial IntelligenceJan-20-2025

--Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods--natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/. Advances in foundation models are rapidly improving the capabilities of autonomous robots, bringing us closer to robots entering our homes where they can complete everyday tasks. However, the need for human instructions will persist-- whether due to limitations in robot policies, models trained on internet-scale data that may not capture the specifics of users' environments or preferences, or simply the desire for users to maintain control over their robots' actions. For instance, a robot asked to wash dishes might follow a standard cleaning routine--e.g., by placing everything in the dishwasher and then putting them away in the cupboard--but may not respect a user's preferences-- e.g., needing to wash delicate glasses "by hand" or organizing cleaned dishes in a specific way--thus necessitating human intervention. We introduce a new paradigm for instructing robots through the direct manipulation of images. ImageInThat is a specific instantiation of this paradigm where users can manipulate images in a timeline-style interface to create instructions for the robot to execute. Existing methods for instructing robots range from those that focus on commanding the robot for the purpose of immediate execution ( e.g., uttering a language instruction to wash glasses by hand [1]) to methods that program the robot such as learning from demonstration [2] or end-user robot programming [3]. However, prior methods, whether they are used for commanding or programming, have notable drawbacks.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.155

Country: North America > United States > Wisconsin (0.14)

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Exploring the Design Space of Cognitive Engagement Techniques with AI-Generated Code for Enhanced Learning

Kazemitabaar, Majeed, Huang, Oliver, Suh, Sangho, Henley, Austin Z., Grossman, Tovi

arXiv.org Artificial IntelligenceOct-11-2024

Novice programmers are increasingly relying on Large Language Models (LLMs) to generate code for learning programming concepts. However, this interaction can lead to superficial engagement, giving learners an illusion of learning and hindering skill development. To address this issue, we conducted a systematic design exploration to develop seven cognitive engagement techniques aimed at promoting deeper engagement with AI-generated code. In this paper, we describe our design process, the initial seven techniques and results from a between-subjects study (N=82). We then iteratively refined the top techniques and further evaluated them through a within-subjects study (N=42). We evaluate the friction each technique introduces, their effectiveness in helping learners apply concepts to isomorphic tasks without AI assistance, and their success in aligning learners' perceived and actual coding abilities. Ultimately, our results highlight the most effective technique: guiding learners through the step-by-step problem-solving process, where they engage in an interactive dialog with the AI, prompting what needs to be done at each stage before the corresponding code is revealed.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.08922

Country:

Oceania (0.93)
Europe > United Kingdom (0.92)
North America > Canada > Ontario > Toronto (0.68)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (0.93)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback

Pu, Kevin, Feng, K. J. Kevin, Grossman, Tovi, Hope, Tom, Mishra, Bhavana Dalvi, Latzke, Matt, Bragg, Jonathan, Chang, Joseph Chee, Siangliulue, Pao

arXiv.org Artificial IntelligenceOct-5-2024

Research ideation involves broad exploring and deep refining ideas. Both require deep engagement with literature. Existing tools focus primarily on idea broad generation, yet offer little support for iterative specification, refinement, and evaluation needed to further develop initial ideas. To bridge this gap, we introduce IdeaSynth, a research idea development system that uses LLMs to provide literature-grounded feedback for articulating research problems, solutions, evaluations, and contributions. IdeaSynth represents these idea facets as nodes on a canvas, and allow researchers to iteratively refine them by creating and exploring variations and composing them. Our lab study (N=20) showed that participants, while using IdeaSynth, explored more alternative ideas and expanded initial ideas with more details compared to a strong LLM-based baseline. Our deployment study (N=7) demonstrated that participants effectively used IdeaSynth for real-world research projects at various ideation stages from developing initial ideas to revising framings of mature manuscripts, highlighting the possibilities to adopt IdeaSynth in researcher's workflows.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.04025

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition

Kazemitabaar, Majeed, Williams, Jack, Drosos, Ian, Grossman, Tovi, Henley, Austin, Negreanu, Carina, Sarkar, Advait

arXiv.org Artificial IntelligenceJul-2-2024

LLM-powered tools like ChatGPT Data Analysis, have the potential to help users tackle the challenging task of data analysis programming, which requires expertise in data processing, programming, and statistics. However, our formative study (n=15) uncovered serious challenges in verifying AI-generated results and steering the AI (i.e., guiding the AI system to produce the desired output). We developed two contrasting approaches to address these challenges. The first (Stepwise) decomposes the problem into step-by-step subgoals with pairs of editable assumptions and code until task completion, while the second (Phasewise) decomposes the entire problem into three editable, logical phases: structured input/output assumptions, execution plan, and code. A controlled, within-subjects experiment (n=18) compared these systems against a conversational baseline. Users reported significantly greater control with the Stepwise and Phasewise systems, and found intervention, correction, and verification easier, compared to the baseline. The results suggest design guidelines and trade-offs for AI-assisted data analysis tools.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.02651

Country:

North America > United States (0.95)
Europe (0.92)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs

Kazemitabaar, Majeed, Ye, Runlong, Wang, Xiaoning, Henley, Austin Z., Denny, Paul, Craig, Michelle, Grossman, Tovi

arXiv.org Artificial IntelligenceJan-20-2024

Timely, personalized feedback is essential for students learning programming, especially as class sizes expand. LLM-based tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-based programming assistant delivering helpful, technically correct responses, without revealing code solutions. For example, CodeAid can answer conceptual questions, generate pseudo-code with line-by-line explanations, and annotate student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights on CodeAid. Findings revealed students primarily used CodeAid for conceptual understanding and debugging, although a minority tried to obtain direct code. Educators appreciated CodeAid's educational approach, and expressed concerns about occasional incorrect feedback and students defaulting to ChatGPT.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2401.11314

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > Ontario > Toronto (0.47)

Genre:

Questionnaire & Opinion Survey (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.93)

Industry:

Education > Educational Setting (0.93)
Education > Curriculum > Subject-Specific Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

Brade, Stephen, Wang, Bryan, Sousa, Mauricio, Newsome, Gregory Lee, Oore, Sageev, Grossman, Tovi

arXiv.org Artificial IntelligenceDec-7-2023

Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe -- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the users preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The results of our user studies show SynthScribe is capable of reliably retrieving and modifying sounds while also affording the ability to create completely new sounds that expand a musicians creative horizon.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.0469

Country:

North America > United States (1.00)
Europe > United Kingdom (0.93)
North America > Canada > Ontario > Toronto (0.47)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.14)

Genre: Questionnaire & Opinion Survey (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Human Computer Interaction > Interfaces (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

ABScribe: Rapid Exploration of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

Reza, Mohi, Laundry, Nathan, Musabirov, Ilya, Dushniku, Peter, Yu, Zhi Yuan "Michael", Mittal, Kashish, Grossman, Tovi, Liut, Michael, Kuzminykh, Anastasia, Williams, Joseph Jay

arXiv.org Artificial IntelligenceOct-10-2023

Exploring alternative ideas by rewriting text is integral to the writing process. State-of-the-art large language models (LLMs) can simplify writing variation generation. However, current interfaces pose challenges for simultaneous consideration of multiple variations: creating new versions without overwriting text can be difficult, and pasting them sequentially can clutter documents, increasing workload and disrupting writers' flow. To tackle this, we present ABScribe, an interface that supports rapid, yet visually structured, exploration of writing variations in human-AI co-writing tasks. With ABScribe, users can swiftly produce multiple variations using LLM prompts, which are auto-converted into reusable buttons. Variations are stored adjacently within text segments for rapid in-place comparisons using mouse-over interactions on a context toolbar. Our user study with 12 writers shows that ABScribe significantly reduces task workload (d = 1.20, p < 0.001), enhances user perceptions of the revision process (d = 2.41, p < 0.001) compared to a popular baseline workflow, and provides insights into how writers explore variations using LLMs.

large language model, machine learning, variation, (18 more...)

arXiv.org Artificial Intelligence

2310.00117

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > Ontario > Toronto (0.48)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media (0.93)
Health & Medicine (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

DiLogics: Creating Web Automation Programs With Diverse Logics

Pu, Kevin, Yang, Jim, Yuan, Angel, Ma, Minyi, Dong, Rui, Wang, Xinyu, Chen, Yan, Grossman, Tovi

arXiv.org Artificial IntelligenceAug-18-2023

Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions. We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3586183.3606822

2308.05828

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States > California > San Francisco County > San Francisco (0.16)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Brade, Stephen, Wang, Bryan, Sousa, Mauricio, Oore, Sageev, Grossman, Tovi

arXiv.org Artificial IntelligenceApr-18-2023

Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address the challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.09337

Country:

Europe (0.93)
North America > Canada > Ontario > Toronto (0.47)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Wang, Bryan, Li, Gang, Zhou, Xin, Chen, Zhourong, Grossman, Tovi, Li, Yang

arXiv.org Artificial IntelligenceAug-6-2021

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across $\sim$22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

deep learning, human computer interaction, summarization, (21 more...)

arXiv.org Artificial Intelligence

2108.03353

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > Scotland (0.14)
North America > Canada > Quebec (0.14)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.66)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback