AITopics | segmentation model

Collaborating Authors

segmentation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leaving No OODInstance Behind: Instance-Level OODFine-Tuning for Anomaly Segmentation

Neural Information Processing SystemsJun-21-2026, 19:57:16 GMT

Out-of-distribution (OOD) fine-tuning has emerged as a promising approach for anomaly segmentation. Current OOD fine-tuning strategies typically employ global-level objectives, aiming to guide segmentation models to accurately predict a large number of anomaly pixels. However, these strategies often perform poorly on small anomalies. To address this issue, we propose an instance-level OOD fine-tuning framework, dubbed LNOIB (Leaving No OODInstance Behind). We start by theoretically analyzing why global-level objectives fail to segment small anomalies. Building on this analysis, we introduce a simple yet effective instancelevel objective. Moreover, we propose a feature separation objective to explicitly constrain the representations of anomalies, which are prone to be smoothed by their in-distribution (ID) surroundings. LNOIB integrates these objectives to enhance the segmentation of small anomalies and serves as a paradigm adaptable to existing OOD fine-tuning strategies, without introducing additional inference cost. Experimental results show that integrating LNOIB into various OOD fine-tuning strategies yields significant improvements, particularly in component-level results, highlighting its strength in comprehensive anomaly segmentation.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(3 more...)

Add feedback

Sample Efficient Multi Round Generative Data Augmentation for Long Tail Instance Segmentation

Neural Information Processing SystemsJun-18-2026, 02:49:17 GMT

Data synthesis has become increasingly crucial for long-tail instance segmentation tasks to mitigate class imbalance and high annotation costs. Previous methods have primarily prioritized the selection of data from a pre-generated image object pool, which frequently leads to the inefficient utilization of generated data. To address this inefficiency, we propose a collaborative approach that incorporates feedback from an instance segmentation model to guide the augmentation process. Specifically, the diffusion model uses feedback to generate objects that exhibit high uncertainty. The number and size of synthesized objects for each class are dynamically adjusted based on the model state to improve learning in underrepresented classes. This augmentation process is further strengthened by running multiple rounds, allowing feedback to be refined throughout training. In summary, multi-round collaborative augmentation (MRCA) enhances sample efficiency by providing optimal synthetic data at the right moment. Our framework requires only 6% of the data generation needed by state-of-the-art methods while outperforming them.

artificial intelligence, augmentation, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Neural Information Processing SystemsJun-17-2026, 08:01:52 GMT

Leveraging recent diffusion models, LiDAR-based large-scale 3D scene generation has achieved great success. While recent voxel-based approaches can generate both geometric structures and semantic labels, existing range-view methods are limited to producing unlabeled LiDAR scenes. Relying on pretrained segmentation models to predict the semantic maps often results in suboptimal cross-modal consistency. To address this limitation while preserving the advantages of range-view representations, such as computational efficiency and simplified network design, we propose SPIRAL, a novel range-view LiDAR diffusion model that simultaneously generates depth, reflectance images, and semantic maps. Furthermore, we introduce novel semantic-aware metrics to evaluate the quality of the generated labeled range-view data. Experiments on the SemanticKITTI and nuScenes datasets demonstrate that SPIRAL achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the generative and segmentation models. Additionally, we validate that range images generated by SPIRAL can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on LiDAR data.

computer vision, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Neural Information Processing SystemsJun-12-2026, 05:19:04 GMT

Leveraging diffusion models, 3D LiDAR scene generation has achieved great success in both range-view and voxel-based representations. While recent voxel-based approaches can generate both geometric structures and semantic labels, existing range-view methods are limited to producing unlabeled LiDAR scenes. Relying on pretrained segmentation models to predict the semantic maps often results in suboptimal cross-modal consistency. To address this limitation while preserving the advantages of range-view representations, such as computational efficiency and simplified network design, we propose Spiral, a novel range-view LiDAR diffusion model that simultaneously generates depth, reflectance images, and semantic maps. Furthermore, we introduce novel semantic-aware metrics to evaluate the quality of the generated labeled range-view data. Experiments on SemanticKITTI and nuScenes datasets demonstrate that Spiral achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the best available generative and segmentation models. Additionally, we validate that Spiral's generated range images can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on LiDAR data.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Zero-Shot Semantic Segmentation

Maxime Bucher, Tuan-Hung VU, Matthieu Cord, Patrick Pérez

Neural Information Processing SystemsApr-30-2026, 20:11:40 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, natural language, segmentation, (19 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Appendix

Neural Information Processing SystemsApr-29-2026, 21:50:12 GMT

Overall dataset-specific architecture: The overall architecture equipped with all our datasetspecific modules is shown in Figure 1. We design the dataset-specific modules to be light-weight, which allows us to save on memory costs.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ce3cf998b7f59271e80ce03fb74a7115-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 19:50:31 GMT

artificial intelligence, machine learning, segmentation, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Arkansas (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Workflow (0.47)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

License of the assets

Neural Information Processing SystemsApr-24-2026, 12:31:04 GMT

Licence for the codes We use the code for MS-TCN [13], ASRF [24], LAS [9], all of which are under MITLicense according to https://opensource.org/licenses/MIT. For the Jigsaws [18] dataset, we follow the data use agreeement according to https://cs.jhu. Action classification: Action classification is the task of identifying a single action, as opposed to a sequence of actions. Several methods use 2DCNNs to extract frame-wise features from an input video, which are then combined to predict a coarse action taking place in the video [56, 39, 59]. There also exist several works that perform action classification from kinematic data [2, 12]. Action segmentation: Action segmentation is the problem of segmenting an input stream of data, labeling each frame according to the action that is being carried out. Earlier methods for action segmentation employed hidden Markov models [33, 22]. More recently, convolutional neural networks [58, 26] and recurrent neural networks [50] have been applied to this problem Inspired by the success of temporal convolutional networks (TCNs) in speech synthesis, [37] adapted these models to action segmentation. MS-TCN [13], which uses a multi-stage TCN architecture, has become one of the most widely used architecture for action segmentation. Although these methods achieve high frame-wise accuracy, they still produce a significant number of over-segmentation errors. In order to address this, several boundary-aware methods have been developed which perform temporal smoothing of the frame-wise predictions [57, 24]. These methods use ground-truth boundary information to train a binary classification network to perform boundary detection. The boundary estimates are then used to aggregate the frame-wise predictions either in a soft manner (boundary-aware pooling) or by setting a hard threshold. However, for elemental actions with a short duration, such as the functional primitives in the StrokeRehab dataset, the duration of each action is very short. As a result, the boundaries between actions can be hard to detect or even hard to define (see Figure 4). Sequence-to-sequence models: Our proposed method is based on sequence-to-sequence (seq2seq) models. These models allow us to learn a mapping of a variable-length input sequence to a variablelength output sequence [53].

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Government (1.00)
Information Technology (0.93)
Law > Intellectual Property & Technology Law (0.46)
Health & Medicine > Therapeutic Area (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Neural Information Processing SystemsMar-22-2026, 19:09:42 GMT

Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.

artificial intelligence, proceedings, speech recognition, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.96)

Add feedback

Filters

Collaborating Authors

segmentation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Leaving No OODInstance Behind: Instance-Level OODFine-Tuning for Anomaly Segmentation

Sample Efficient Multi Round Generative Data Augmentation for Long Tail Instance Segmentation

SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding

Zero-Shot Semantic Segmentation

Appendix

d4eed238cf5807c6b75face996302892-Paper-Conference.pdf

ce3cf998b7f59271e80ce03fb74a7115-Paper-Conference.pdf

License of the assets

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR