Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

Neural Information Processing Systems 

This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers.