Seifi, Soroush
OOD Aware Supervised Contrastive Learning
Seifi, Soroush, Reino, Daniel Olmeda, Chumerin, Nikolay, Aljundi, Rahaf
Out-of-Distribution (OOD) detection is a crucial problem for the safe deployment of machine learning models identifying samples that fall outside of the training distribution, i.e. in-distribution data (ID). Most OOD works focus on the classification models trained with Cross Entropy (CE) and attempt to fix its inherent issues. In this work we leverage powerful representation learned with Supervised Contrastive (SupCon) training and propose a holistic approach to learn a classifier robust to OOD data. We extend SupCon loss with two additional contrast terms. The first term pushes auxiliary OOD representations away from ID representations without imposing any constraints on similarities among auxiliary data. The second term pushes OOD features far from the existing class prototypes, while pushing ID representations closer to their corresponding class prototype. When auxiliary OOD data is not available, we propose feature mixing techniques to efficiently generate pseudo-OOD features. Our solution is simple and efficient and acts as a natural extension of the closed-set supervised contrastive representation learning. We compare against different OOD detection methods on the common benchmarks and show state-of-the-art results.
How to improve CNN-based 6-DoF camera pose estimation
Seifi, Soroush, Tuytelaars, Tinne
Convolutional neural networks (CNNs) and transfer learning have recently been used for 6 degrees of freedom (6-DoF) camera pose estimation. While they do not reach the same accuracy as visual SLAM-based approaches and are restricted to a specific environment, they excel in robustness and can be applied even to a single image. In this paper, we study PoseNet [1] and investigate modifications based on datasets' characteristics to improve the accuracy of the pose estimates. In particular, we emphasize the importance of field-of-view over image resolution; we present a data augmentation scheme to reduce overfitting; we study the effect of Long-Short-Term-Memory (LSTM) cells. Lastly, we combine these modifications and improve PoseNet's performance for monocular CNN based camera pose regression.
Where to Look Next: Unsupervised Active Visual Exploration on 360{\deg} Input
Seifi, Soroush, Tuytelaars, Tinne
We address the problem of active visual exploration of large 360{\deg} inputs. In our setting an active agent with a limited camera bandwidth explores its 360{\deg} environment by changing its viewing direction at limited discrete time steps. As such, it observes the world as a sequence of narrow field-of-view 'glimpses', deciding for itself where to look next. Our proposed method exceeds previous works' performance by a significant margin without the need for deep reinforcement learning or training separate networks as sidekicks. A key component of our system are the spatial memory maps that make the system aware of the glimpses' orientations (locations in the 360{\deg} image). Further, we stress the advantages of retina-like glimpses when the agent's sensor bandwidth and time-steps are limited. Finally, we use our trained model to do classification of the whole scene using only the information observed in the glimpses.