I. SUPPLEMENTARY CODE

Neural Information Processing Systems 

The time series is then downsampled by a factor of 8. We fit the model using The datasets are then downsampled by 4 a factor of 10. This transformation linearly registers the embedded attractor to the original 5 attractor via translation, rotation, reflection, but not shear. For example, after this transformation, mirror images of a spiral would become congruent, whereas a sphere and ellipsoid would not. All hyperparameters are held constant, and the only difference across replicates is the random weight initialization. Lorenz dataset, generated by models with different initial random weight initializations but identical hyperparameters.