semantic segmentation
GTPBD: AFine-Grained Global Terraced Parcel and Boundary Dataset
Agricultural parcels serve as basic units for conducting agricultural practices and applications, which is vital for land ownership registration, food security assessment, soil erosion monitoring, etc. However, existing agriculture parcel extraction studies only focus on mid-resolution mapping or regular plain farmlands while lacking representation of complex terraced terrains due to the demands of precision agriculture. In this paper, we introduce a more fine-grained terraced parcel dataset named GTPBD (Global Terraced Parcel and Boundary Dataset), which is the first fine-grained dataset covering major worldwide terraced regions with more than 200,000 complex terraced parcels with manually annotation. GTPBD comprises 47,537 high-resolution images with three-level labels, including pixel-level boundary labels, mask labels, and parcel labels. It covers seven major geographic zones in China and transcontinental climatic regions around the world. Compared to the existing datasets, the GTPBD dataset brings considerable challenges due to the: (1) terrain diversity; (2) complex and irregular parcel objects; and (3) multiple domain styles. Our proposed GTPBD dataset is suitable for four different tasks, including semantic segmentation, edge detection, terraced parcel extraction and unsupervised domain adaptation (UDA) tasks.
Leaving No OODInstance Behind: Instance-Level OODFine-Tuning for Anomaly Segmentation
Out-of-distribution (OOD) fine-tuning has emerged as a promising approach for anomaly segmentation. Current OOD fine-tuning strategies typically employ global-level objectives, aiming to guide segmentation models to accurately predict a large number of anomaly pixels. However, these strategies often perform poorly on small anomalies. To address this issue, we propose an instance-level OOD fine-tuning framework, dubbed LNOIB (Leaving No OODInstance Behind). We start by theoretically analyzing why global-level objectives fail to segment small anomalies. Building on this analysis, we introduce a simple yet effective instancelevel objective. Moreover, we propose a feature separation objective to explicitly constrain the representations of anomalies, which are prone to be smoothed by their in-distribution (ID) surroundings. LNOIB integrates these objectives to enhance the segmentation of small anomalies and serves as a paradigm adaptable to existing OOD fine-tuning strategies, without introducing additional inference cost. Experimental results show that integrating LNOIB into various OOD fine-tuning strategies yields significant improvements, particularly in component-level results, highlighting its strength in comprehensive anomaly segmentation.
Reasoning Beyond Points: AVisual Introspective Approach for Few-Shot 3DSegmentation
Point Cloud Few-Shot Semantic Segmentation (PC-FSS) aims to segment unknown categories in query samples using only a small number of annotated support samples. However, scene complexity and insufficient representation of local geometric structures pose significant challenges to PC-FSS. To address these issues, we propose a novel pre-training-free Visual Introspective Prototype Segmentation network (VIP-Seg). Specifically, we design a Visual Introspective Prototype (VIP) module that employs a multi-step reasoning approach to tackle intra-class diversity and domain gaps between support and query sets. The VIP module consists of a Prototype Enhancement Module (PEM) and a Prototype Difference Module (PDM), which work alternately to progressively refine prototypes. The PEM enhances prototype discriminability and reduces intra-class diversity, while the PDM learns common representations from the differences between query and support features, effectively eliminating semantic inconsistencies caused by domain gaps. To further reduce intra-class diversity and enhance point discriminative ability, we propose a Dynamic Power Convolution (DyPowerConv) that leverages learnable power functions to effectively capture local geometric structures and detailed features of point clouds. Extensive experiments on S3DIS and ScanNet demonstrate that our proposed VIP-Seg significantly outperforms current state-of-the-art methods, proving its effectiveness in PC-FSS tasks.
Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective
Pseudo-label learning is widely used in semantic segmentation, particularly in label-scarce scenarios such as unsupervised domain adaptation (UDA) and semisupervised learning (SSL). Despite its success, this paradigm can generate erroneous pseudo-labels, which are further amplified during training due to utilization of one-hot encoding. To address this issue, we propose ECOCSeg, a novel perspective for segmentation models that utilizes error-correcting output codes (ECOC) to create a fine-grained encoding for each class.
Novel Class Discovery for Point Cloud Segmentation via Joint Learning of Causal Representation and Reasoning
In this paper, we focus on Novel Class Discovery for Point Cloud Segmentation (3D-NCD), aiming to learn a model that can segment unlabeled (novel) 3D classes using only the supervision from labeled (base) 3D classes. The key to this task is to setup the exact correlations between the point representations and their base class labels, as well as the representation correlations between the points from base and novel classes. A coarse or statistical correlation learning may lead to the confusion in novel class inference.
BeyondMix: Leveraging Structural Priors and Long-Range Dependencies for Domain-Invariant LiDARSegmentation
Domain adaptation for LiDAR semantic segmentation remains challenging due to the complex structural properties of point cloud data. While mix-based paradigms have shown promise, they often fail to fully leverage the rich structural priors inherent in 3DLiDAR point clouds. In this paper, we identify three critical yet underexploited structural priors: permutation invariance, local consistency, and geometric consistency. We introduce BeyondMix, a novel framework that harnesses the capabilities of State Space Models (specifically Mamba) to construct and exploit these structural priors while modeling long-range dependencies that transcend the limited receptive fields of conventional voxel-based approaches. By employing space-filling curves to impose sequential ordering on point cloud data and implementing strategic spatial partitioning schemes, BeyondMix effectively captures domain-invariant representations. Extensive experiments on challenging LiDAR semantic segmentation benchmarks demonstrate that our approach consistently outperforms existing state-of-the-art methods, establishing a new paradigm for unsupervised domain adaptation in 3D point cloud understanding.
Differentiable Hierarchical Visual Tokenization
Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remaining backward-compatible with existing architectures for retrofitting pretrained models. Our method uses hierarchical model selection with information criteria to provide competitive performance in both image-level classification and dense-prediction tasks, and even supports out-of-the-box raster-to-vector conversion.
Continual Gaussian Mixture Distribution Modeling for Class Incremental Semantic Segmentation
Class incremental semantic segmentation (CISS) enables a model to continually segment new classes from non-stationary data while preserving previously learned knowledge. Recent top-performing approaches are prototype-based methods that assign a prototype to each learned class to reproduce previous knowledge. However, modeling each class distribution relying on only a single prototype, which remains fixed throughout the incremental process, presents two key limitations: (i) a single prototype is insufficient to accurately represent the complete class distribution when incoming data stream for a class is naturally multimodal; (ii) the features of old classes may exhibit anisotropy during the incremental process, preventing fixed prototypes from faithfully reproducing the matched distribution. To address the aforementioned limitations, we propose a Continual Gaussian Mixture Distribution (CoGaMiD) modeling method. Specifically, the means and covariance matrices of the Gaussian Mixture Models (GMMs) are estimated to model the complete feature distributions of learned classes.
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Leveraging recent diffusion models, LiDAR-based large-scale 3D scene generation has achieved great success. While recent voxel-based approaches can generate both geometric structures and semantic labels, existing range-view methods are limited to producing unlabeled LiDAR scenes. Relying on pretrained segmentation models to predict the semantic maps often results in suboptimal cross-modal consistency. To address this limitation while preserving the advantages of range-view representations, such as computational efficiency and simplified network design, we propose SPIRAL, a novel range-view LiDAR diffusion model that simultaneously generates depth, reflectance images, and semantic maps. Furthermore, we introduce novel semantic-aware metrics to evaluate the quality of the generated labeled range-view data. Experiments on the SemanticKITTI and nuScenes datasets demonstrate that SPIRAL achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the generative and segmentation models. Additionally, we validate that range images generated by SPIRAL can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on LiDAR data.