The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain

Wang, Xin

arXiv.org Machine Learning 

In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression. Just like perceptual and cognitive neurophysiology has inspired effective deep neural network architectures which in turn make a useful model for understanding the brain, here we explore how biological neural development might inspire efficient and robust optimization procedures which in turn serve as a useful model for the maturation and aging of the brain. Hopefully it would inspire the reader in one way or two, or at the very least, kill some boredom during a global pandemic. We are going to touch on the following topics through the lens of large language models: - How do overparameterized deep neural nets generalize? - How does transfer learning help generalization? Before we start, it is prudent to say a few words about the brain metaphor, to clarify this author's position on the issue as it often arises central at debates. The confluence of deep learning and neuroscience arguably took place as early as the conception of artificial neural nets, because artificial neurons abstract characteristic behaviors of biological ones (McCulloch and Pitts, 1943).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found