Goto

Collaborating Authors

 Tambwekar, Pradyumna


Generating CAD Code with Vision-Language Models for 3D Designs

arXiv.org Artificial Intelligence

Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided Design (CAD) scripting code, which can then be executed to render a 3D object; however, the resulting 3D object may not meet the specified requirements. Testing the correctness of CAD generated code is challenging due to the complexity and structure of 3D objects (e.g., shapes, surfaces, and dimensions) that are not feasible in code. In this paper, we introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code. Our approach works by producing ameliorative feedback by prompting a Vision-Language Model (VLM) to generate and answer a set of validation questions to verify the generated object and prompt the VLM to correct deviations. To evaluate CADCodeVerify, we introduce, CADPrompt, the first benchmark for CAD code generation, consisting of 200 natural language prompts paired with expert-annotated scripting code for 3D objects to benchmark progress. Our findings show that CADCodeVerify improves VLM performance by providing visual feedback, enhancing the structure of the 3D objects, and increasing the success rate of the compiled program. When applied to GPT-4, CADCodeVerify achieved a 7.30% reduction in Point Cloud distance and a 5.0% improvement in success rate compared to prior work


A Computational Interface to Translate Strategic Intent from Unstructured Language in a Low-Data Setting

arXiv.org Artificial Intelligence

Many real-world tasks involve a mixed-initiative setup, wherein humans and AI systems collaboratively perform a task. While significant work has been conducted towards enabling humans to specify, through language, exactly how an agent should complete a task (i.e., low-level specification), prior work lacks on interpreting the high-level strategic intent of the human commanders. Parsing strategic intent from language will allow autonomous systems to independently operate according to the user's plan without frequent guidance or instruction. In this paper, we build a computational interface capable of translating unstructured language strategies into actionable intent in the form of goals and constraints. Leveraging a game environment, we collect a dataset of over 1000 examples, mapping language strategies to the corresponding goals and constraints, and show that our model, trained on this dataset, significantly outperforms human interpreters in inferring strategic intent (i.e., goals and constraints) from language (p < 0.05). Furthermore, we show that our model (125M parameters) significantly outperforms ChatGPT for this task (p < 0.05) in a low-data setting.


Controllable Neural Story Plot Generation via Reward Shaping

arXiv.org Artificial Intelligence

By themselves, large neural language models have Language-modeling-based approaches to story been shown to work well with a variety of short-term tasks, plot generation attempt to construct a plot by sampling such as understanding short children's stories [Radford et al., from a language model (LM) to predict the 2019]. However, while recurrent neural networks (RNNs) using next character, word, or sentence to add to the story. LSTM or GRU cells can theoretically maintain long-term LM techniques lack the ability to receive guidance context in their hidden layers, in practice RNNs only use a from the user to achieve a specific goal, resulting in relatively small part of the history of tokens [Khandelwal et stories that don't have a clear sense of progression al., 2018]. Consequently, stories or plots generated by RNNs and lack coherence. We present a reward-shaping tend to lose coherence as the generation continues.


Towards the design of user-centric strategy recommendation systems for collaborative Human-AI tasks

arXiv.org Artificial Intelligence

Artificial Intelligence is being employed by humans to collaboratively solve complicated tasks for search and rescue, manufacturing, etc. Efficient teamwork can be achieved by understanding user preferences and recommending different strategies for solving the particular task to humans. Prior work has focused on personalization of recommendation systems for relatively well-understood tasks in the context of e-commerce or social networks. In this paper, we seek to understand the important factors to consider while designing user-centric strategy recommendation systems for decision-making. We conducted a human-subjects experiment (n=60) for measuring the preferences of users with different personality types towards different strategy recommendation systems. We conducted our experiment across four types of strategy recommendation modalities that have been established in prior work: (1) Single strategy recommendation, (2) Multiple similar recommendations, (3) Multiple diverse recommendations, (4) All possible strategies recommendations. While these strategy recommendation schemes have been explored independently in prior work, our study is novel in that we employ all of them simultaneously and in the context of strategy recommendations, to provide us an in-depth overview of the perception of different strategy recommendation systems. We found that certain personality traits, such as conscientiousness, notably impact the preference towards a particular type of system (p < 0.01). Finally, we report an interesting relationship between usability, alignment and perceived intelligence wherein greater perceived alignment of recommendations with one's own preferences leads to higher perceived intelligence (p < 0.01) and higher usability (p < 0.01).


Towards Reconciling Usability and Usefulness of Explainable AI Methodologies

arXiv.org Artificial Intelligence

Interactive Artificial Intelligence (AI) agents are becoming increasingly prevalent in society. However, application of such systems without understanding them can be problematic. Black-box AI systems can lead to liability and accountability issues when they produce an incorrect decision. Explainable AI (XAI) seeks to bridge the knowledge gap, between developers and end-users, by offering insights into how an AI algorithm functions. Many modern algorithms focus on making the AI model "transparent", i.e. unveil the inherent functionality of the agent in a simpler format. However, these approaches do not cater to end-users of these systems, as users may not possess the requisite knowledge to understand these explanations in a reasonable amount of time. Therefore, to be able to develop suitable XAI methods, we need to understand the factors which influence subjective perception and objective usability. In this paper, we present a novel user-study which studies four differing XAI modalities commonly employed in prior work for explaining AI behavior, i.e. Decision Trees, Text, Programs. We study these XAI modalities in the context of explaining the actions of a self-driving car on a highway, as driving is an easily understandable real-world task and self-driving cars is a keen area of interest within the AI community. Our findings highlight internal consistency issues wherein participants perceived language explanations to be significantly more usable, however participants were better able to objectively understand the decision making process of the car through a decision tree explanation. Our work also provides further evidence of importance of integrating user-specific and situational criteria into the design of XAI systems. Our findings show that factors such as computer science experience, and watching the car succeed or fail can impact the perception and usefulness of the explanation.


Interpretable Policy Specification and Synthesis through Natural Language and RL

arXiv.org Artificial Intelligence

Policy specification is a process by which a human can initialize a robot's behaviour and, in turn, warm-start policy optimization via Reinforcement Learning (RL). While policy specification/design is inherently a collaborative process, modern methods based on Learning from Demonstration or Deep RL lack the model interpretability and accessibility to be classified as such. Current state-of-the-art methods for policy specification rely on black-box models, which are an insufficient means of collaboration for non-expert users: These models provide no means of inspecting policies learnt by the agent and are not focused on creating a usable modality for teaching robot behaviour. In this paper, we propose a novel machine learning framework that enables humans to 1) specify, through natural language, interpretable policies in the form of easy-to-understand decision trees, 2) leverage these policies to warm-start reinforcement learning and 3) outperform baselines that lack our natural language initialization mechanism. We train our approach by collecting a first-of-its-kind corpus mapping free-form natural language policy descriptions to decision tree-based policies. We show that our novel framework translates natural language to decision trees with a 96% and 97% accuracy on a held-out corpus across two domains, respectively. Finally, we validate that policies initialized with natural language commands are able to significantly outperform relevant baselines (p < 0.001) that do not benefit from our natural language-based warm-start technique.


Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions

arXiv.org Artificial Intelligence

Automated rationale generation is an approach for real-time explanation generation whereby a computational model learns to translate an autonomous agent's internal state and action data representations into natural language. Training on human explanation data can enable agents to learn to generate human-like explanations for their behavior. In this paper, using the context of an agent that plays Frogger, we describe (a) how to collect a corpus of explanations, (b) how to train a neural rationale generator to produce different styles of rationales, and (c) how people perceive these rationales. We conducted two user studies. The first study establishes the plausibility of each type of generated rationale and situates their user perceptions along the dimensions of confidence, humanlike-ness, adequate justification, and understandability. The second study further explores user preferences between the generated rationales with regard to confidence in the autonomous agent, communicating failure and unexpected behavior. Overall, we find alignment between the intended differences in features of the generated rationales and the perceived differences by users. Moreover, context permitting, participants preferred detailed rationales to form a stable mental model of the agent's behavior.