Du, Ruofei
Arm Robot: AR-Enhanced Embodied Control and Visualization for Intuitive Robot Arm Manipulation
Pei, Siyou, Chen, Alexander, Kaoshik, Ronak, Du, Ruofei, Zhang, Yang
Embodied interaction has been introduced to human-robot interaction (HRI) as a type of teleoperation, in which users control robot arms with bodily action via handheld controllers or haptic gloves. Embodied teleoperation has made robot control intuitive to non-technical users, but differences between humans' and robots' capabilities \eg ranges of motion and response time, remain challenging. In response, we present Arm Robot, an embodied robot arm teleoperation system that helps users tackle human-robot discrepancies. Specifically, Arm Robot (1) includes AR visualization as real-time feedback on temporal and spatial discrepancies, and (2) allows users to change observing perspectives and expand action space. We conducted a user study (N=18) to investigate the usability of the Arm Robot and learn how users perceive the embodiment. Our results show users could use Arm Robot's features to effectively control the robot arm, providing insights for continued work in embodied HRI.
Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication
Hu, Erzhen, Li, Mingyi, Hong, Jungtaek, Qian, Xun, Olwal, Alex, Kim, David, Heo, Seongkook, Du, Ruofei
During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.
Augmented Object Intelligence: Making the Analog World Interactable with XR-Objects
Dogan, Mustafa Doga, Gonzalez, Eric J., Colaco, Andrea, Ahuja, Karan, Du, Ruofei, Lee, Johnny, Gonzalez-Franco, Mar, Kim, David
Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to vast digital functionalities. Our approach utilizes object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in rich and contextually relevant ways. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through a variety of use cases and a user study.
Next Steps for Human-Centered Generative AI: A Technical Perspective
Chen, Xiang 'Anthony', Burke, Jeff, Du, Ruofei, Hong, Matthew K., Jacobs, Jennifer, Laban, Philippe, Li, Dingzeyu, Peng, Nanyun, Willis, Karl D. D., Wu, Chien-Sheng, Zhou, Bolei
Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary research teams to pursue a coherent set of emergent ideas in HGAI, focusing on their interested topics while maintaining a coherent big picture of the future work landscape.
InstructPipe: Building Visual Programming Pipelines with Human Instructions
Zhou, Zhongyi, Jin, Jing, Phadnis, Vrushank, Yuan, Xiuxiu, Jiang, Jun, Qian, Xun, Zhou, Jingtao, Huang, Yiyi, Xu, Zheng, Zhang, Yinda, Wright, Kristen, Mayes, Jason, Sherwood, Mark, Lee, Johnny, Olwal, Alex, Kim, David, Iyengar, Ram, Li, Na, Du, Ruofei
Visual programming provides beginner-level programmers with a coding-free experience to build their customized pipelines. Existing systems require users to build a pipeline entirely from scratch, implying that novice users need to set up and link appropriate nodes all by themselves, starting from a blank workspace. We present InstructPipe, an AI assistant that enables users to start prototyping machine learning (ML) pipelines with text instructions. We designed two LLM modules and a code interpreter to execute our solution. LLM modules generate pseudocode of a target pipeline, and the interpreter renders a pipeline in the node-graph editor for further human-AI collaboration. Technical evaluations reveal that InstructPipe reduces user interactions by 81.1% compared to traditional methods. Our user study (N=16) showed that InstructPipe empowers novice users to streamline their workflow in creating desired ML pipelines, reduce their learning curve, and spark innovative ideas with open-ended commands.
SketchyScene: Richly-Annotated Scene Sketches
Zou, Changqing, Yu, Qian, Du, Ruofei, Mo, Haoran, Song, Yi-Zhe, Xiang, Tao, Gao, Chengying, Chen, Baoquan, Zhang, Hao
We contribute the first large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities of realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition. We demonstrate the potential impact of SketchyScene by training new computational models for semantic segmentation of scene sketches and showing how the new dataset enables several applications including image retrieval, sketch colorization, editing, and captioning, etc. The dataset and code can be found at https://github.com/SketchyScene/SketchyScene.