Supplementary Materials for Continual Learning with Node-Importance based Adaptive Group Sparse Regularization Sangwon Jung

Neural Information Processing Systems 

As mentioned in Section 3.3 (manuscript), our PGD update plays a critical role in achieving high accuracy. Here, we compare with a method without PGD. ' w/o PGD' in Figure 2 indicates First, the average accuracy (Figure 2(a)) of ' w/o PGD' is much Second, we observe the sparsity (Figure 2(b)) of ' w/o PGD' decreases The details on hyperparameters are in Table 1. Table 4 shows the detailed results used to generate (Figure 4, manuscript). Figure 5 shows the details of our model.