Goto

Collaborating Authors

 robot photographer


PhotoBot: Reference-Guided Interactive Photography via Natural Language

Limoyo, Oliver, Li, Jimmy, Rivkin, Dmitriy, Kelly, Jonathan, Dudek, Gregory

arXiv.org Artificial Intelligence

We introduce PhotoBot, a framework for automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via a reference picture that is retrieved from a curated gallery. We exploit a visual language model (VLM) and an object detector to characterize reference pictures via textual descriptions and use a large language model (LLM) to retrieve relevant reference pictures based on a user's language query through text-based reasoning. To correspond the reference picture and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across significantly varying images. Using these features, we compute pose adjustments for an RGB-D camera by solving a Perspective-n-Point (PnP) problem. We demonstrate our approach on a real-world manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback.


ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Rivkin, Dmitriy, Dudek, Gregory, Kakodkar, Nikhil, Meger, David, Limoyo, Oliver, Liu, Xue, Hogan, Francois

arXiv.org Artificial Intelligence

Our work examines the way in which large language models can be used for robotic planning and sampling, specifically the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.


Shutter, the Robot Photographer: Leveraging Behavior Trees for Public, In-the-Wild Human-Robot Interactions

Lew, Alexander, Thompson, Sydney, Tsoi, Nathan, Vázquez, Marynel

arXiv.org Artificial Intelligence

Deploying interactive systems in-the-wild requires adaptability to situations not encountered in lab environments. Our work details our experience about the impact of architecture choice on behavior reusability and reactivity while deploying a public interactive system. In particular, we introduce Shutter, a robot photographer and a platform for public interaction. In designing Shutter's architecture, we focused on adaptability for in-the-wild deployment, while developing a reusable platform to facilitate future research in public human-robot interaction. We find that behavior trees allow reactivity, especially in group settings, and encourage designing reusable behaviors.


Pixy drone hands-on: A flying robot photographer for Snapchat users

Engadget

Drones are everywhere these days, filming dramatic reveals and awe-inspiring scenery for social media platforms. The problem is, they're not exactly approachable for beginners who have only ever used a smartphone. Last month, Snap debuted the $230 Pixy drone exactly for those people. It requires very little skill and acts like a personal robot photographer to help you produce nifty aerial shots. You don't need to pilot the Pixy.


1673

AI Magazine

We have developed an autonomous robot system that takes well-composed photographs of people at social events, such as weddings and conference receptions. In this article, we outline the overall architecture of the system and describe how the various components interrelate. We also describe our experiences deploying the robot photographer at a number of real-world events. The system is capable of operating in unaltered environments and has been deployed at a number of real-world events. This article gives an overview of the entire robot photographer system, and provides details of the architecture underlying the implementation.


Say Cheese! Experiences with a Robot Photographer

Byers, Zachary, Dixon, Michael, Smart, William D., Grimm, Cindy M.

AI Magazine

We have developed an autonomous robot system that takes well-composed photographs of people at social events, such as weddings and conference receptions. In this article, we outline the overall architecture of the system and describe how the various components interrelate. We also describe our experiences deploying the robot photographer at a number of real-world events.


Say Cheese! Experiences with a Robot Photographer

Byers, Zachary, Dixon, Michael, Smart, William D., Grimm, Cindy M.

AI Magazine

This model makes system debugging significantly easier, because we know We introduced a sensor abstraction layer to exactly what each sensor reading is at every separate the task layer from concerns about point in the computation; something that physical sensing devices. We process the sensor would not be the case if we were reading from information (from the laser rangefinder in this the sensors every time a reading was used in a application) into distance measurements from calculation. This model also allows us to inject the center of the robot, thus allowing consideration modified sensor readings into the system, as of sensor error models and performance described in the next section.