Goto

Collaborating Authors

 Africa








A distributional simplicity bias in the learning dynamics of transformers

Neural Information Processing Systems

The remarkable capability of over-parameterised neural networks to generalise effectively has been explained by invoking a "simplicity bias": neural networks prevent overfitting by initially learning simple classifiers before progressing to



ASPEN: Breaking Operator Barriers for Efficient Parallel Execution of Deep Neural Networks

Neural Information Processing Systems

ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential.