07211688a0869d995947a8fb11b215d6-AuthorFeedback.pdf
–Neural Information Processing Systems
In our work, the context prior is defined as a2 confounder setC = {c1,c2,...,cn}, wherenis the class size in dataset. By(Q1)inTable1ofthemain5 paper, we directly concat the predicted mask (i.e., Seg.Mask in Table 1) into the backbone network. Its identifiability assumes that the confounder set is fully ob-14 served, e.g., a ground-truth vocabulary of contexts in our visual world. Unfortunately, it is impossible in prac-15 tice and thus CONTA requires an iterative "guess" of the hidden confounder. Therefore, at each iteration, we16 need what you suggested: "one example of horse (person) without person (horse)", or more generally, "one ex-17 ample of class A without B", to disentangle A and B. Fortunately, it is feasible in the PASCAL and COCO18 datasets.
Neural Information Processing Systems
Feb-7-2026, 09:15:55 GMT