Continuous attractors offer a unique class of solutions for storing continuous-valued variables in recurrent system states for indefinitely long time intervals.
We implement this method by distilling and optimizing generalizable interpolated representations and assigning distinct weights for each modality considering their divergent generalization capabilities.
Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set.