Learning to communicate to learn (Communication, Part 2) ARAYA Inc.
In the previous post, I talked about networks which can take datasets rather than data points as input, using pooling over the data to assemble a representation of the task which is then used to classify the test set. This satisfies some of our goals of making networks which learn how to learn continually – as the size of the training dataset is increased, the variance of the network's representation of the task will tend to decrease and the network's performance on the task will improve. However, there still seems to be a gap between this kind of behavior and the sort of generality possessed by algorithms like backprop (which is broad enough to include things like learning how to learn). The issue seems to be that once the networks have learned to parameterize the ensemble of tasks they're given, what's left is just a fixed-dimension parametric inference problem. The ultimate complexity of what the network can do is pretty strongly constrained by the complexity of the representation layer and the task ensemble during training via gradient descent.
May-25-2017, 19:45:09 GMT