Transfer Learning in Visual and Relational Reasoning
Similar developments have emerged in the Natural Language Processing (NLP) community. The success of transfer learning raises several research questions, such as the characteristics which make a dataset more favorable to be used in pretraining (notably ImageNet [huh2016makes]), or regarding the observed performance correlation of models with different architectures between the source and target domains [kornblith2019better]. One of the most systematic works in this area is the computational taxonomic map for task transfer learning [zamir2018taskonomy], which aimed at discovering the dependencies between twenty-six 2D, 2.5D, 3D, and semantic computer vision tasks. In this work we focus on transfer learning in multi-modal tasks combining vision and language [mogadala2019trends]. More precisely, we narrow the scope to transfer learning between visual reasoning tasks that have a "nice" logical structure, e.g., [johnson2017clevr, yang2018dataset, song2018explore].
Nov-29-2019, 09:14:17 GMT
- Technology: