Appendix A Data Acquisition

Neural Information Processing Systems 

The full procedure of data collection and data preprocessing is described in detail in Ref. [ In addition, a top-down camera recorded the mouse in the arena at 60 Hz . Kernel size 7 performed best. We repeated the grid search for CNNs with different number of convolutional layers. We initially hypothesized that an autoencoder could provide regularization benefits over a "vanilla" CNN, because the reconstruction loss might encourage the model to learn visual features that are useful for decoding. After hyperparameter search, we settled on size 256 for the latent space vector, and the weight of the reconstruction loss relative to the Poisson loss was fixed at 0.5.