Goto

Collaborating Authors

 Education


The Description Length of Deep Learning models

Neural Information Processing Systems

Solomonoff's general theory of inference (Solomonoff, 1964) and the Minimum Description Length principle (Grünwald, 2007; Rissanen, 2007) formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks (Hinton and V an Camp, 1993; Schmidhuber, 1997). Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff's approach.









Interactive Structure Learning with Structural Query-by-Committee

Neural Information Processing Systems

We present a generalization of the query-by-committee active learning algorithm for this setting, and we study its consistency and rate of convergence, both theoretically and empirically, with and without noise.


What enables human language? A biocultural framework Science

Science

Case study 1 considers vocal production learning, an organism's capacity to enlarge and modify its repertoire of vocalizations based on auditory experience. This ability is crucial for learning spoken language and limited in nonhuman primates but has emerged in other branches of the evolutionary tree, including subsets of birds, bats, elephants, cetaceans, and pinnipeds. Bringing together data from molecular investigations of speech and language disorders, genetic manipulations in animal models, and studies of ancient DNA, this case study demonstrates how ancient genetic and neural infrastructures may have been modified and recombined to enable distinctive human capacities. Case study 2 examines the emergence of linguistic structure, a defining property of human language, using data from real-world cases of emergence (e.g., homesign and emerging sign languages); experiments recreating cultural evolution in the lab; and comparative studies of nonhuman animals, including songbirds and primates. This case study highlights the importance of transmission and interaction, suggesting that emergence of structure involves a combination of biological, cognitive, and cultural conditions: Although some (or all) traits are shared with other species, their combination may be specific to humans.