compositionality
Compositional Plan Vectors
Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations - for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.
6a42b45af2b72e6e5b5e3a6fe695809f-Supplemental-Datasets_and_Benchmarks.pdf
The model can easily distinguish A and B according to the background (i.e., the so-called geometric skews [26]), but not according to the features of the class instance itself. However, if there is another class C, which is also in black background. In this tri-classification task (distinguishing A,B, and C), an ideal model should focus on the feature of the instance itself but not the background. This is one of the difficulties: distribution bias on samples, that some beneficial features (e.g., background) may be good for the classification, but not good for understanding the class (in a compositional way). Another difficulty is entanglement of the labels. We provide the labels in a relative way that the label of A is '0' and of B is '1', but not their true textual meanings (e.g., white paper and green leaves). The concept information is entangled and embedded into the label, thus, it is hard for the model to tell which visual features capture the corresponding concepts (i.e., white refers to the color feature and paper refers to the texture feature). We hope our understanding of this issue can inspire researchers to focus more on compositionality and design excellent continual learners.
Emergent Communication: Generalization and Overfitting in Lewis Games
Lewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better.