r/MachineLearning - [1905.11786] Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Jan-12-2020, 11:48:59 GMT–#artificialintelligence

In general agree, but in machine learning mutual information seems to be a case where approximation can help sometime rather than hurt. In another discussion this week about the Tishby information bottleneck cameldrv correctly said that the mutual information between a signal and its encrypted version should be high, but in practice no algorithm will discover this. But turn that around: when used in a complex DNN, a learning algorithm that seeks to maximize mutual information (such as today's putting-and-end-to-end-to-end) could in theory produce something like a weak encryption: the desired information is extracted, but it is in such a complex form that _another_ DNN classifier would be needed to extract it! So the fact that mutual information can only approximate can be a good thing, because this is prevented when optimizing objectives that cannot "see" complex relationships. A radical example is in the HSIC bottlneck paper where an approximation that is only monotonically related spontaneously produced on-hot classifications without any guidance.

gradient-isolated learning, machinelearning, mutual information, (4 more...)

#artificialintelligence

Jan-12-2020, 11:48:59 GMT

News Web Page

Add feedback

Industry:
- Media > News (0.40)

Technology:
- Information Technology
  - Communications > Social Media (0.76)
  - Artificial Intelligence > Machine Learning (0.64)