Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
Maheswaranathan, Niru, Williams, Alex, Golub, Matthew, Ganguli, Surya, Sussillo, David
–Neural Information Processing Systems
Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points.
Neural Information Processing Systems
Mar-19-2020, 03:04:22 GMT