Banburski-Fahey, Andrzej
Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds
Kobenova, Amina, DeVeaux, Cyan, Parajuli, Samyak, Banburski-Fahey, Andrzej, Fernandez, Judith Amores, Lanier, Jaron
Generative artificial intelligence has shown promise in prompting virtual worlds into existence, yet little attention has been given to understanding how this process unfolds as social interaction. We present Social Conjurer, a framework for AI-augmented dynamic 3D scene co-creation, where multiple users collaboratively build and modify virtual worlds in real-time. Through an expanded set of interactions, including social and tool-based engagements as well as spatial reasoning, our framework facilitates the creation of rich, diverse virtual environments. Findings from a preliminary user study (N=12) provide insight into the user experience of this approach, how social contexts shape the prompting of spatial environments, and perspective on social applications of prompt-based 3D co-creation. In addition to highlighting the potential of AI-supported multi-user world creation and offering new pathways for AI-augmented creative processes in VR, this article presents a set of implications for designing human-centered interfaces that incorporate AI models into 3D content generation.
DreamGarden: A Designer Assistant for Growing Games from a Single Prompt
Earle, Sam, Parajuli, Samyak, Banburski-Fahey, Andrzej
Coding assistants are increasingly leveraged in game design, both generating code and making high-level plans. To what degree can these tools align with developer workflows, and what new modes of human-computer interaction can emerge from their use? We present DreamGarden, an AI system capable of assisting with the development of diverse game environments in Unreal Engine. At the core of our method is an LLM-driven planner, capable of breaking down a single, high-level prompt -- a dream, memory, or imagined scenario provided by a human user -- into a hierarchical action plan, which is then distributed across specialized submodules facilitating concrete implementation. This system is presented to the user as a garden of plans and actions, both growing independently and responding to user intervention via seed prompts, pruning, and feedback. Through a user study, we explore design implications of this system, charting courses for future work in semi-autonomous assistants and open-ended simulation design.
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
De La Torre, Fernanda, Fang, Cathy Mengying, Huang, Han, Banburski-Fahey, Andrzej, Fernandez, Judith Amores, Lanier, Jaron
We present Large Language Model for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.
Real-time Animation Generation and Control on Rigged Models via Large Language Models
Huang, Han, De La Torre, Fernanda, Fang, Cathy Mengying, Banburski-Fahey, Andrzej, Amores, Judith, Lanier, Jaron
We introduce a novel method for real-time animation control and generation on rigged models using natural language input. First, we embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations. Second, we illustrate LLM's potential to enable flexible state transition between existing animations. We showcase the robustness of our approach through qualitative results on various rigged models and motions.
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
Xu, Weijia, Banburski-Fahey, Andrzej, Jojic, Nebojsa
We introduce Reprompting, an iterative sampling algorithm that searches for the Chain-of-Thought (CoT) recipes for a given task without human intervention. Through Gibbs sampling, we infer CoT recipes that work consistently well for a set of training samples. Our method iteratively samples new recipes using previously sampled solutions as parent prompts to solve other training problems. On five Big-Bench Hard tasks that require multi-step reasoning, Reprompting achieves consistently better performance than the zero-shot, few-shot, and human-written CoT baselines. Reprompting can also facilitate transfer of knowledge from a stronger model to a weaker model leading to substantially improved performance of the weaker model. Overall, Reprompting brings up to +17 point improvements over the previous state-of-the-art method that uses human-written CoT prompts.
Steps towards prompt-based creation of virtual worlds
Roberts, Jasmine, Banburski-Fahey, Andrzej, Lanier, Jaron
Multimodal text-to-image models, like DALL-Large language models trained for code generation can be E 2 [34], Midjourney [11] or Stable Diffusion [35] are applied to speaking virtual worlds into existence (creating raising concerns about displacing concept artists and have virtual worlds). In this work we show that prompt-based already won at least one major art competition [36]. Large methods can both accelerate in-VR level editing, as well Language Models (LLMs), like GPT-3 [6], are not only as can become part of gameplay rather than just part of generating very convincing text completions, but have game development. As an example, we present Codex recently become capable of generating code with models VR Pong which shows non-deterministic game mechanics like OpenAI Codex [8] or AlphaCode [25]. We propose using generative processes to not only create static content in this paper that these capabilities can be combined to but also non-trivial interactions between 3D objects. This allow "speaking the world into existence", or taking natural demonstration naturally leads to an integral discussion on language descriptions and turning them into interactive how one would evaluate and benchmark experiences created visual scenes within a game engine. In particular, this by generative models - as there are no qualitative or has the potential for allowing authoring Virtual Reality quantitative metrics that apply in these scenarios. We conclude (VR) experiences from within the headset, as well as allow by discussing impending challenges of AI-assisted completely novel modes of gameplay.