A Appendix

Neural Information Processing Systems 

A.1 Case study of a learned continual learner We notice that when learning a new task, the first layer ( i.e., the lowest layer) At the same time, because each task has its specific and different characteristics, the lower layers of the structure also tend to use the "mask" action to control the output value of "mask", which shows that the higher levels have the ability to combine low-dimensional features. Therefore, there are more operations to "fuse" to combine the abilities of previous tasks. With/without "mask" means use (or don't use) the "mask" action. The results are summarized in Table 5. BNS is better overall than the baseline models in terms of forward transfer. This is due to the use of reinforcement learning. Each number is the sum of all task model parameters in each setting in the final network after all tasks have been learned.Model MNIST A Table 9: Training time (minutes) used by our BNS model and all baselines in each experiment.Model MNIST A

Similar Docs  Excel Report  more

TitleSimilaritySource
None found