[R] 2 Hr. Talk "Information Theory of Deep Learning" (Naftali Tishby) • r/MachineLearning

Oct-12-2017, 06:20:44 GMT–@machinelearnbot

This 2 hour long talk clarifies and goes over in detail many of the details people were interested in from his original talk. Questions in order; what is the difference between the noise in SGD and the typical Langevin dynamics, how does the theory deal with saturated gradients, is it a reasonable strategy to perform early stopping specifically before the compression phase, have you tried the framework on ResNets, what is the message for practitioners, how does the performance of dropout/regularizers relate to the theory, have you considered the connection to a fermi gas equilibrium.

artificial intelligence, information theory, machine learning, (4 more...)

@machinelearnbot

Oct-12-2017, 06:20:44 GMT

News Web Page

Add feedback

Industry:
- Media > News (0.40)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found