Lee, Kyoobin
3rd Workshop on Maritime Computer Vision (MaCVi) 2025: Challenge Results
Kiefer, Benjamin, Žust, Lojze, Muhovič, Jon, Kristan, Matej, Perš, Janez, Teršek, Matija, Desai, Uma Mudenagudi Chaitra, Wiliem, Arnold, Kreis, Marten, Akalwadi, Nikhil, Quan, Yitong, Zhong, Zhiqiang, Zhang, Zhe, Liu, Sujie, Chen, Xuran, Yang, Yang, Fabijanić, Matej, Ferreira, Fausto, Lee, Seongju, Lee, Junseok, Lee, Kyoobin, Yao, Shanliang, Guan, Runwei, Huang, Xiaoyu, Ni, Yi, Kumar, Himanshu, Feng, Yuan, Cheng, Yi-Ching, Lin, Tzu-Yu, Lee, Chia-Ming, Hsu, Chih-Chung, Sheikh, Jannik, Michel, Andreas, Gross, Wolfgang, Weinmann, Martin, Šarić, Josip, Lin, Yipeng, Yang, Xiang, Jiang, Nan, Lu, Yutang, Feng, Fei, Awad, Ali, Lucas, Evan, Saleem, Ashraf, Cheng, Ching-Heng, Lin, Yu-Fan, Lin, Tzu-Yu, Hsu, Chih-Chung
The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi25.
PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation
Lee, Geonhyup, Lee, Joosoon, Noh, Sangjun, Ko, Minhwan, Kim, Kangmin, Lee, Kyoobin
The study addresses the foundational and challenging task of peg-in-hole assembly in robotics, where misalignments caused by sensor inaccuracies and mechanical errors often result in insertion failures or jamming. This research introduces PolyFit, representing a paradigm shift by transitioning from a reinforcement learning approach to a supervised learning methodology. PolyFit is a Force/Torque (F/T)-based supervised learning framework designed for 5-DoF peg-in-hole assembly. It utilizes F/T data for accurate extrinsic pose estimation and adjusts the peg pose to rectify misalignments. Extensive training in a simulated environment involves a dataset encompassing a diverse range of peg-hole shapes, extrinsic poses, and their corresponding contact F/T readings. To enhance extrinsic pose estimation, a multi-point contact strategy is integrated into the model input, recognizing that identical F/T readings can indicate different poses. The study proposes a sim-to-real adaptation method for real-world application, using a sim-real paired dataset to enable effective generalization to complex and unseen polygon shapes. PolyFit achieves impressive peg-in-hole success rates of 97.3% and 96.3% for seen and unseen shapes in simulations, respectively. Real-world evaluations further demonstrate substantial success rates of 86.7% and 85.0%, highlighting the robustness and adaptability of the proposed method.
Learning to Place Unseen Objects Stably using a Large-scale Simulation
Noh, Sangjun, Kang, Raeyoung, Kim, Taewon, Back, Seunghyeok, Bak, Seongho, Lee, Kyoobin
Object placement is a fundamental task for robots, yet it remains challenging for partially observed objects. Existing methods for object placement have limitations, such as the requirement for a complete 3D model of the object or the inability to handle complex shapes and novel objects that restrict the applicability of robots in the real world. Herein, we focus on addressing the Unseen Object Placement (UOP}=) problem. We tackled the UOP problem using two methods: (1) UOP-Sim, a large-scale dataset to accommodate various shapes and novel objects, and (2) UOP-Net, a point cloud segmentation-based approach that directly detects the most stable plane from partial point clouds. Our UOP approach enables robots to place objects stably, even when the object's shape and properties are not fully known, thus providing a promising solution for object placement in various environments. We verify our approach through simulation and real-world robot experiments, demonstrating state-of-the-art performance for placing single-view and partial objects. Robot demos, codes, and dataset are available at https://gistailab.github.io/uop/
INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation
Back, Seunghyeok, Lee, Sangbeom, Kim, Kangmin, Lee, Joosoon, Shin, Sungho, Maeng, Jaemo, Lee, Kyoobin
Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.
Automatic Detection of Injection and Press Mold Parts on 2D Drawing Using Deep Neural Network
Lee, Junseok, Kim, Jongwon, Park, Jumi, Back, Seunghyeok, Bak, Seongho, Lee, Kyoobin
This paper proposes a method to automatically detect the key feature parts in a CAD of commercial TV and monitor using a deep neural network. We developed a deep learning pipeline that can detect the injection parts such as hook, boss, undercut and press parts such as DPS, Embo-Screwless, Embo-Burring, and EMBO in the 2D CAD drawing images. We first cropped the drawing to a specific size for the training efficiency of a deep neural network. Then, we use Cascade R-CNN to find the position of injection and press parts and use Resnet-50 to predict the orientation of the parts. Finally, we convert the position of the parts found through the cropped image to the position of the original image. As a result, we obtained detection accuracy of injection and press parts with 84.1% in AP (Average Precision), 91.2% in AR(Average Recall), 72.0% in AP, 87.0% in AR, and orientation accuracy of injection and press parts with 94.4% and 92.0%, which can facilitate the faster design in industrial product design.
Multiple Classification with Split Learning
Kim, Jongwon, Shin, Sungho, Yu, Yeonguk, Lee, Junseok, Lee, Kyoobin
Privacy issues were raised in the process of training deep learning in medical, mobility, and other fields. To solve this problem, we present privacy-preserving distributed deep learning method that allow clients to learn a variety of data without direct exposure. We divided a single deep learning architecture into a common extractor, a cloud model and a local classifier for the distributed learning. First, the common extractor, which is used by local clients, extracts secure features from the input data. The secure features also take the role that the cloud model can employ various task and diverse types of data. The feature contain the most important information that helps to proceed various task. Second, the cloud model including most parts of the whole training model gets the embedded features from the massive local clients, and performs most of deep learning operations which takes severe computing cost. After the operations in cloud model finished, outputs of the cloud model send back to local clients. Finally, the local classifier determined classification results and delivers the results to local clients. When clients train models, our model does not directly expose sensitive information to exterior network. During the test, the average performance improvement was 2.63% over the existing local training model. However, in a distributed environment, there is a possibility of inversion attack due to exposed features. For this reason, we experimented with the common extractor to prevent data restoration. The quality of restoration of the original image was tested by adjusting the depth of the common extractor. As a result, we found that the deeper the common extractor, the restoration score decreased to 89.74.
Intra- and Inter-epoch Temporal Context Network (IITNet) for Automatic Sleep Stage Scoring
Back, Seunghyeok, Lee, Seongju, Seo, Hogeon, Park, Deokhwan, Kim, Tae, Lee, Kyoobin
This study proposes a novel deep learning model, called IITNet, to learn intra- and inter-epoch temporal contexts from a raw single channel electroencephalogram (EEG) for automatic sleep stage scoring. When sleep experts identify the sleep stage of a 30-second PSG data called an epoch, they investigate the sleep-related events such as sleep spindles, K-complex, and frequency components from local segments of an epoch (sub-epoch) and consider the relations between sleep-related events of successive epochs to follow the transition rules. Inspired by this, IITNet learns how to encode sub-epoch into representative feature via a deep residual network, then captures contextual information in the sequence of representative features via BiLSTM. Thus, IITNet can extract features in sub-epoch level and consider temporal context not only between epochs but also in an epoch. IITNet is an end-to-end architecture and does not need any preprocessing, handcrafted feature design, balanced sampling, pre-training, or fine-tuning. Our model was trained and evaluated in Sleep-EDF and MASS datasets and outperformed other state-of-the-art results on both the datasets with the overall accuracy (ACC) of 84.0% and 86.6%, macro F1-score (MF1) of 77.7 and 80.8, and Cohen's kappa of 0.78 and 0.80 in Sleep-EDF and MASS, respectively.