Goto

Collaborating Authors

 Object-Oriented Architecture



UMB: Understanding Model Behavior for Open-World Object Detection

Neural Information Processing Systems

Open-World Object Detection (OWOD) is a challenging task that requires the detector to identify unlabeled objects and continuously demands the detector to learn new knowledge based on existing ones. Existing methods primarily focus on recalling unknown objects, neglecting to explore the reasons behind them. This paper aims to understand the model's behavior in predicting the unknown category.




803c6ab3d62346e004ef70211d2d15b8-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This problem has been studied in the computer vision literature as re-identification, though mostly restricted to specific object categories such as people and cars. We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics.


Supplementary Material 1 Additional Implementation Details

Neural Information Processing Systems

We printed a checkerboard with a 9x10 grid of blocks, each measuring 87 mm x 87 mm. Parameter V alue Model Architecture Panoptic-PolarNet Test Batch Size 2 V al Batch Size 2 Test Batch size 1 post proc threshold 0.1 post proc nms kernel 5 post proc top k 100 center loss MSE offset loss L1 center loss weight 100 offset loss weight 10 enable SAP True SAP start epoch 30 SAP rate 0.01 Table 3: Parameters for Panoptic Segmentation model Parameter V alue(s) Model Architecture 4D-StOP Learning Rate 0.0005 Momentum 0.98 Stride 1 Max in points 5000 Sampling importance Decay Sampling None Input Threads 16 Checkpoint Gap 100 Table 4: Parameters for the 4D Panoptic Segmentation model The results reveal a significant variance in performance across different categories. Notably, 'Structure' and'Ground' both achieved high mIoU at Result The results are shown in Table 8. presents the mean intersection-over-union (mIoU) percent-56 Notably, 'Structure' achieved the highest mIoU at'General Objects' category have the lowest mIoU, The dataset is divided into 17 and 6 categories, respectively. Ground' and'Roads', as opposed to grouping anything related to ground as a single category. Overall, the performance across these tasks underscores the challenges posed by our dataset's With our dataset, future work can focus on improving the model's capacity to handle such diverse The raw data, processed data, and framework code can be found on our website.




DesCo: Learning Object Recognition with Rich Language Descriptions

Neural Information Processing Systems

Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and thus improve the models' adaptability to novel objects and domains.