Since the policy used to collect transitions is changing throughout learning, the replay memory contains data coming from a mixture of policies (that differ from the agent's current policy), and
The framework also uses uncertainty measures on the Gaussian representations of thepreviously learned classes tofindthemost informativesamples tobelabeled in an increment. We evaluate our approach on the CORe-50 dataset and on a real humanoid robot for the object classification task.