Goto

Collaborating Authors

 cityscape validation


A Supplementary Material A.1 AMWC Heuristic

Neural Information Processing Systems

Whenever a merge operation is performed the corresponding edge is contracted and new edges can potentially be created (Lines 7-17). Afterwards, the clusters belonging to non-partitionable class (i.e stuff) are merged. Table 2 contains the hyperparameters used for fully differentiable training. We can see that optimizing PQ surrogate gives better performance and using separate losses decreases the performance especially on'thing' classes. Table 5 showing that all trials improve over the baseline by fully differentiable training.







Self-trained Panoptic Segmentation

Verma, Shourya

arXiv.org Artificial Intelligence

Panoptic segmentation is an important computer vision task which combines semantic and instance segmentation. It plays a crucial role in domains of medical image analysis, self-driving vehicles, and robotics by providing a comprehensive understanding of visual environments. Traditionally, deep learning panoptic segmentation models have relied on dense and accurately annotated training data, which is expensive and time consuming to obtain. Recent advancements in self-supervised learning approaches have shown great potential in leveraging synthetic and unlabelled data to generate pseudo-labels using self-training to improve the performance of instance and semantic segmentation models. The three available methods for self-supervised panoptic segmentation use proposal-based transformer architectures which are computationally expensive, complicated and engineered for specific tasks. The aim of this work is to develop a framework to perform embedding-based self-supervised panoptic segmentation using self-training in a synthetic-to-real domain adaptation problem setting.


Towards holistic scene understanding: Semantic segmentation and beyond

Meletis, Panagiotis

arXiv.org Artificial Intelligence

This dissertation addresses visual scene understanding and enhances segmentation performance and generalization, training efficiency of networks, and holistic understanding. First, we investigate semantic segmentation in the context of street scenes and train semantic segmentation networks on combinations of various datasets. In Chapter 2 we design a framework of hierarchical classifiers over a single convolutional backbone, and train it end-to-end on a combination of pixel-labeled datasets, improving generalizability and the number of recognizable semantic concepts. Chapter 3 focuses on enriching semantic segmentation with weak supervision and proposes a weakly-supervised algorithm for training with bounding box-level and image-level supervision instead of only with per-pixel supervision. The memory and computational load challenges that arise from simultaneous training on multiple datasets are addressed in Chapter 4. We propose two methodologies for selecting informative and diverse samples from datasets with weak supervision to reduce our networks' ecological footprint without sacrificing performance. Motivated by memory and computation efficiency requirements, in Chapter 5, we rethink simultaneous training on heterogeneous datasets and propose a universal semantic segmentation framework. This framework achieves consistent increases in performance metrics and semantic knowledgeability by exploiting various scene understanding datasets. Chapter 6 introduces the novel task of part-aware panoptic segmentation, which extends our reasoning towards holistic scene understanding. This task combines scene and parts-level semantics with instance-level object detection. In conclusion, our contributions span over convolutional network architectures, weakly-supervised learning, part and panoptic segmentation, paving the way towards a holistic, rich, and sustainable visual scene understanding.


Split GCN: Effective Interactive Annotation for Segmentation of Disconnected Instance

Kim, Namgil, Kang, Barom, Cho, Yeonok

arXiv.org Artificial Intelligence

Annotating object boundaries by humans demands high costs. Recently, polygon-based annotation methods with human interaction have shown successful performance. However, given the connected vertex topology, these methods exhibit difficulty predicting the disconnected components in an object. This paper introduces Split-GCN, a novel architecture based on the polygon approach and self-attention mechanism. By offering the direction information, Split-GCN enables the polygon vertices to move more precisely to the object boundary. Our model successfully predicts disconnected components of an object by transforming the initial topology using the context exchange about the dependencies of vertices. Split-GCN demonstrates competitive performance with the state-of-the-art models on Cityscapes and even higher performance with the baseline models. On four cross-domain datasets, we confirm our model's generalization ability.


Prediction Error Meta Classification in Semantic Segmentation: Detection via Aggregated Dispersion Measures of Softmax Probabilities

Rottmann, Matthias, Colling, Pascal, Hack, Thomas-Paul, Hüger, Fabian, Schlicht, Peter, Gottschalk, Hanno

arXiv.org Machine Learning

We present a method that "meta" classifies whether segments (objects) predicted by a semantic segmentation neural network intersect with the ground truth. To this end, we employ measures of dispersion for predicted pixel-wise class probability distributions, like classification entropy, that yield heat maps of the input scene's size. We aggregate these dispersion measures segment-wise and derive metrics that are well-correlated with the segment-wise $\mathit{IoU}$ of prediction and ground truth. In our tests, we use two publicly available DeepLabv3+ networks (pre-trained on the Cityscapes data set) and analyze the predictive power of different metrics and different sets of metrics. To this end, we compute logistic LASSO regression fits for the task of classifying $\mathit{IoU}=0$ vs. $\mathit{IoU} > 0$ per segment and obtain classification rates of up to $81.91\%$ and AUROC values of up to $87.71\%$ without the incorporation of advanced techniques like Monte-Carlo dropout. We complement these tests with linear regression fits to predict the segment-wise $\mathit{IoU}$ and obtain prediction standard deviations of down to $0.130$ as well as $R^2$ values of up to $81.48\%$. We show that these results clearly outperform single-metric baseline approaches.