Collaborating Authors

Optical Flow augmented Semantic Segmentation networks for Automated Driving Machine Learning

Motion is a dominant cue in automated driving systems. Optical flow is typically computed to detect moving objects and to estimate depth using triangulation. In this paper, our motivation is to leverage the existing dense optical flow to improve the performance of semantic segmentation. To provide a systematic study, we construct four different architectures which use RGB only, flow only, RGBF concatenated and two-stream RGB + flow. We evaluate these networks on two automotive datasets namely Virtual KITTI and Cityscapes using the state-of-the-art flow estimator FlowNet v2. We also make use of the ground truth optical flow in Virtual KITTI to serve as an ideal estimator and a standard Farneback optical flow algorithm to study the effect of noise. Using the flow ground truth in Virtual KITTI, two-stream architecture achieves the best results with an improvement of 4% IoU. As expected, there is a large improvement for moving objects like trucks, vans and cars with 38%, 28% and 6% increase in IoU. FlowNet produces an improvement of 2.4% in average IoU with larger improvement in the moving objects corresponding to 26%, 11% and 5% in trucks, vans and cars. In Cityscapes, flow augmentation provided an improvement for moving objects like motorcycle and train with an increase of 17% and 7% in IoU.

QANet - Quality Assurance Network for Microscopy Cell Segmentation Artificial Intelligence

Tools and methods for automatic image segmentation are rapidly developing, each with its own strengths and weaknesses. While these methods are designed to be as general as possible, there are no guarantees for their performance on new data. The choice between methods is usually based on benchmark performance whereas the data in the benchmark can be significantly different than that of the user. We introduce a novel Deep Learning method which, given an image and a proposed corresponding segmentation, estimates the Intersection over Union measure (IoU) with respect to the unknown ground truth. We refer to this method as a Quality Assurance Network - QANet. The QANet is designed to give the user an estimate of the segmentation quality on the users own, private, data without the need for human inspection or labelling. It is based on the RibCage Network architecture, originally proposed as a discriminator in an adversarial network framework. Promising IoU prediction results are demonstrated based on the Cell Segmentation Benchmark. The code is freely available at: ANONYMOUS.

Semantic Image Segmentation for Autonomous Driving


Have you ever pondered about how seamlessly we can drive a car, and why is it so difficult to get a computer to drive a car? It's because our minds are highly evolved and complex and embedding this complexity into a computer is challenging. And today we will cover a tiny fraction of our journey towards achieving self-driving cars. The task I will be discussing in this post is called semantic segmentation. Segmentation as the name suggests is the act of dividing something into separate parts.

Semantic Segmentation of 150 Classes of Objects With 5 Lines of Code


It is now possible to perform segmentation on 150 classes of objects using ade20k model with PixelLib. Ade20k model is a deeplabv3 model trained on ade20k dataset, a dataset with 150 classes of objects. It is now possible to perform segmentation on 150 classes of objects using ade20k model with PixelLib. Ade20k model is a deeplabv3 model trained on ade20k dataset, a dataset with 150 classes of objects. Thanks to tensorflow deeplab's model zoo, I extracted ade20k model from its tensorflow model checkpoint.

Improving automated segmentation of radio shows with audio embeddings Machine Learning

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.