Review for NeurIPS paper: Causal Intervention for Weakly-Supervised Semantic Segmentation
–Neural Information Processing Systems
Weaknesses: 1. Questions about the structural causal model 1) I feel that the confounder set C can be interpreted as "object shapes and where to place them". But I still do not have an intuitive way to interpret the image-specific context representation M. 2) Why is X - M instead of M - X? From my understanding, we sample object shapes and their locations to get M. And then later we sample object appearance (e.g., texture, lighting, etc.) to get X. 2. Implementation 1) Since the images in both VOC and COCO have different sizes and ratios, I wonder how the authors construct the confounder set C. 2) Is the segmentation mask X_m (L195) logits or probabilities? 3) I feel a bit confused about Eqn. It seems that W_1 and W_2 are used as projection matrices, reducing the dimension from original spatial size (hw) to the number of class (n). I wonder if this is reasonable.
Neural Information Processing Systems
Jan-21-2025, 07:15:23 GMT
- Technology: