Verification of Recurrent Neural Networks Through Rule Extraction
Wang, Qinglong, Zhang, Kaixuan, Liu, Xue, Giles, C. Lee
The verification problem for neural networks is verifying whether a neural network will suffer from adversarial samples, or approximating the maximal allowed scale of adversarial perturbation that can be endured. While most prior work contributes to verifying feed-forward networks, little has been explored for verifying recurrent networks. This is due to the existence of a more rigorous constraint on the perturbation space for sequential data, and the lack of a proper metric for measuring the perturbation. In this work, we address these challenges by proposing a metric which measures the distance between strings, and use deterministic finite automata (DFA) to represent a rigorous oracle which examines if the generated adversarial samples violate certain constraints on a perturbation. More specifically, we empirically show that certain recurrent networks allow relatively stable DFA extraction. As such, DFAs extracted from these recurrent networks can serve as a surrogate oracle for when the ground truth DFA is unknown. We apply our verification mechanism to several widely used recurrent networks on a set of the Tomita grammars. The results demonstrate that only a few models remain robust against adversarial samples. In addition, we show that for grammars with different levels of complexity, there is also a difference in the difficulty of robust learning of these grammars.
Nov-14-2018
- Country:
- Asia > Middle East
- Qatar (0.14)
- Europe (1.00)
- North America > United States
- California > San Francisco County > San Francisco (0.14)
- Asia > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: