Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
–Neural Information Processing Systems
Neural Probabilistic Circuits (NPCs), a new class of concept bottleneck models, comprise an attribute recognition model and a probabilistic circuit for reasoning. By integrating the outputs from these two modules, NPCs produce compositional and interpretable predictions. While offering enhanced interpretability and high performance on downstream tasks, the neural-network-based attribute recognition model remains a black box. This vulnerability allows adversarial attacks to manipulate attribute predictions by introducing carefully crafted, subtle perturbations to input images, potentially compromising the final predictions. In this paper, we theoretically analyze the adversarial robustness of NPC and demonstrate that it only depends on the robustness of the attribute recognition model and is independent of the robustness of the probabilistic circuit. Moreover, we propose RNPC, the first robust neural probabilistic circuit against adversarial attacks on the recognition module.
Neural Information Processing Systems
Jun-18-2026, 12:16:22 GMT
- Country:
- North America (0.46)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Transportation (0.88)
- Technology: