simulation image
unclear points and will update the paper accordingly in the final version. 2 To Reviewer # 1. 1. Architectures for generators and discriminators. We adopt the generator and discriminator
We sincerely thank all the reviewers for their insightful comments to help us improve the paper. T o Reviewer #2. 1. Are multiple sources more beneficial? This is largely due to the fact that domain gap also exists among different source domains. We will reorganize the layout of Figure 1 in the main paper to make it more clear. We thank the reviewer for pointing this out.
Embedded Image-to-Image Translation for Efficient Sim-to-Real Transfer in Learning-based Robot-Assisted Soft Manipulation
Colan, Jacinto, Sugita, Keisuke, Davila, Ana, Yamada, Yutaro, Hasegawa, Yasuhisa
Recent advances in robotic learning in simulation have shown impressive results in accelerating learning complex manipulation skills. However, the sim-to-real gap, caused by discrepancies between simulation and reality, poses significant challenges for the effective deployment of autonomous surgical systems. We propose a novel approach utilizing image translation models to mitigate domain mismatches and facilitate efficient robot skill learning in a simulated environment. Our method involves the use of contrastive unpaired Image-to-image translation, allowing for the acquisition of embedded representations from these transformed images. Subsequently, these embeddings are used to improve the efficiency of training surgical manipulation models. We conducted experiments to evaluate the performance of our approach, demonstrating that it significantly enhances task success rates and reduces the steps required for task completion compared to traditional methods. The results indicate that our proposed system effectively bridges the sim-to-real gap, providing a robust framework for advancing the autonomy of surgical robots in minimally invasive procedures.
- Leisure & Entertainment > Games > Computer Games (0.55)
- Health & Medicine > Surgery (0.49)
A Study on Quantifying Sim2Real Image Gap in Autonomous Driving Simulations Using Lane Segmentation Attention Map Similarity
Park, Seongjeong, Pahk, Jinu, Jahn, Lennart Lorenz Freimuth, Lim, Yongseob, An, Jinung, Choi, Gyeungho
Autonomous driving simulations require highly realistic images. Our preliminary study found that when the CARLA Simulator image was made more like reality by using DCLGAN, the performance of the lane recognition model improved to levels comparable to real-world driving. It was also confirmed that the vehicle's ability to return to the center of the lane after deviating from it improved significantly. However, there is currently no agreed-upon metric for quantitatively evaluating the realism of simulation images. To address this issue, based on the idea that FID (Fr\'echet Inception Distance) measures the feature vector distribution distance using a pre-trained model, this paper proposes a metric that measures the similarity of simulation road images using the attention map from the self-attention distillation process of ENet-SAD. Finally, this paper verified the suitability of the measurement method by applying it to the image of the CARLA map that implemented a realworld autonomous driving test road.
- Automobiles & Trucks (0.95)
- Transportation > Ground > Road (0.85)
- Information Technology > Robotics & Automation (0.85)
How Much Can Autonomous Cars Learn from Virtual Worlds?
To be able to drive safely and reliably, autonomous cars need to have a comprehensive understanding of what's going on around them. They need to recognize other cars, trucks, motorcycles, bikes, humans, traffic lights, street signs, and everything else that may end up on or near a road. They also have to do this in all kinds of weather and lighting conditions, which is why most (if not all) companies developing autonomous cars are spending a ludicrous (but necessary) amount of time and resources collecting data in an attempt to gain experience with every possible situation. In most cases, this technique depends on humans making annotations to enormous sets of data in order to train machine learning algorithms: hundreds or thousands of people looking at snapshots or videos taken by cars driving down streets, and drawing boxes around vehicles and road signs and labeling them, over and over. Researchers from the University of Michigan think there's a better way: Doing the whole thing in simulation instead, and they've shown that it can actually be more effective than using real data annotated by humans.
- North America > United States > Michigan (0.27)
- Europe > Germany (0.06)
- Asia > Singapore (0.06)
- North America > United States > California (0.05)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)