Interpretable Representation Learning from Videos using Nonlinear Priors