The technological standards of photography have dramatically increased over the last few years. While cell phones used to not even have photo capture capabilities, nowadays, it is becoming more and more expected that modern smartphones can take pictures of a quality close to that of a dedicated camera. The computer vision community has recently focused on assessing image aesthetics for image cropping and learning systems for capturing cinematographic videos. In other words, researchers are asking themselves if robots can understand and capture aesthetically pleasing imagery. However, these systems often focus on composing specific objects of interest into the photo by using composition heuristics, like the rule of thirds, and do not translate directly to a system that can capture well-composed photographs in general.
Can a Robot Do Photography?
To address this limitation, a research team from Cornell University, (Hadi AlZayer, Hubert Lin, and Kavita Bala) used Jackal UGV to develop an autonomous system that can capture well-composed photographs in various environments. Possible applications range from quickly highlighting the aesthetics of property in real estate to capturing scenic outdoor views from a drone during an outdoors expedition. The primary challenge in this project was to create an algorithm that could drive a robot to take photos. The team opted for a reinforcement learning approach which teaches a robot to take photographs by exploring different scenarios in a simulated environment. Once the robot learned to take photos, their next challenge was to deploy this robot in different real-world environments.
Ultimately then, the goal of the project is to achieve automatically what a skilled photographer would achieve: a good shot composition by adjusting the camera. The team looked to solve this problem by breaking down the process of photo capture to a series of actions taken by the photographer. For example, the photographer looks at the view from the camera and tries to improve the composition by moving and rotating the camera. They will repeat these actions until the photographer finds the perfect composition, and then they will capture the photo. The team explicitly modeled this process in simulation and trained a deep neural network model to maximize the aesthetics of the photos taken. The aesthetic value of a photo was estimated based on existing work from computational photography. Jackal UGV was then used to apply this model in real life to the robot to navigate indoor scenes to capture photos.
Creating a Model to Capture Aesthetic Photos
The process of photography was modeled as a Partially Observed Markov Decision Process (POMDP) that the team could directly optimize. They used an aesthetic estimation model to define the aesthetic value of a photo and associated the objective to maximize the aesthetic value of the captured photo. The state of the process is the view that the robot sees at the current time step, and the actions are movement and rotation actions that could be used to improve the composition of the view seen through the camera. Optimizing a model for such a process requires a significant number of runs on the robot, so it was necessary to perform the optimization process in simulation.
Once the model was optimized, the team deployed it on a robot to automatically capture photos in real life to validate its ability to work in real-world environments. They used Jackal UGV as their platform for the real-life deployment of the model by connecting a camera and implementing the motion actions that their model uses in the photo capture process (moving forward and backward, and different amounts of rotations to the right and left).
“The Clearpath Jackal UGV provided an excellent platform for our autonomous photographer due to the ability for us to programmatically control its movements and rotations. The photo-taking algorithm was able to directly control the Clearpath Jackal UGV through the ROS API.”
– Hadi Alzayer, Master of Science Student
During deployment, the camera views were fed to the model, and an action was selected based on the view. This action could be motion (translation or rotation) in order to improve the composition of the photograph, or a terminating photo-capture action. For this project, the team relied exclusively on a webcam camera for sensing. The goal was to capture a photograph based on views through a camera without the use of additional information.
To get this project off the ground, however, without a ready-to-go robotic platform, would have required building a controller and robot from scratch through significant effort and without the assurances that the robot would function well in various weather conditions. Therefore, Jackal UGV provided a clean solution as it is weather-proof and easy to control. As Hadi Alzayer, Master of Science student at Cornell University said: “The Clearpath Jackal UGV provided an excellent platform for our autonomous photographer due to the ability for us to programmatically control its movements and rotations. The photo-taking algorithm was able to directly control the Clearpath Jackal UGV through the ROS API.”
In the future, the team is interested in using alternatives to simulation, like using existing videos from the internet to build a system to automatically capture photos. This would increase the diversity of domains where they can train their model since real-life videos are significantly more abundant than realistic simulations. For the time being, their research findings have already found great success as it was published in IROS 2021 and nominated for Best Paper in Entertainment. You can read the full paper here.
The project team was composed of Hadi AlZayer (M.S. Student), Hubert Lin (Ph.D. Student), and Kavita Bala (Dean of the Cornell Bowers College of Computing and Information Science, and Professor of Computer Science).
To learn more about the project, you can visit the project page here.
To learn more about Jackal UGV, visit our website here.