Goto

Collaborating Authors

 ofthepaper


AppendixforTask-FreeContinualLearningVia OnlineDiscrepancyDistanceLearning

Neural Information Processing Systems

Theorem1.Let Pi represent the distribution of all seen training samples (including all previous Agoodtrade-offbetween themodel'scomplexityandgeneralization performance, observedfrom Eq. (12), is allowing each component to learn the underlying data distribution of a unique target set. By satisfying the ideal selection process (Eq.(22) of the paper) and also consideringthateachcomponent Gtfinishedthetrainingon Mkt atTkt,weassumethatthedynamic 4 expansion modelG can be seen as a single modelh trained on all previously learnt memories Maximal Interfered Retrieval (MIR), [1] is one of 5 themostpopular memory-based approaches, whichusesamemory bufferwithasample selection criterion. Since Pi would involve several underlying data distributions as the number of training steps (i) increases, the diversity in the memory plays an important role to ensure a tight GB in Eq.(15). G be single model which consists of a classifierh HandaVAEmodelv. M be a memory buffer updated at the training stepTi. Figure 1: The learning process of the proposed ODDL-S, which consists of three phases.


19eca5979ccbb752778e6c5f090dc9b6-AuthorFeedback.pdf

Neural Information Processing Systems

Reviewer 1 Q: Novelty against [3]: The differences: (1) They do reinforcement learning, while we do imitation1 learning,whichismuchharder. Forexample, whytoassumeitisonlynecessary tomodel interactions within onetypeof6 agent (lines 151-152): Thank you for the suggestion. We will include them in the discussion. It shows that the agent learns to put different38 attention at different time step. In the morning (8 am) it puts more39 attention on 1-2 types of vehicles.