LOCL: Learning Object-Attribute Composition using Localization

Kumar, Satish, Iftekhar, ASM, Prashnani, Ekta, Manjunath, B. S.

arXiv.org Artificial Intelligence 

Human visual reasoning allows us to leverage prior visual experience to recognize previously unseen Object-Attribute (O-A) relationships. Predicting such complex relationships of novel O-A compositions - referred to as Composition Zero Shot Learning (CZSL) [17, 19, 21, 22, 25, 28, 33, 36]-is an active area of research. There has been significant progress on CZSL methods in recent years, however, as our experiments demonstrate, their performance degrades in natural cluttered scenes, as illustrated in Fig.1. The main reason in these cases is the interference from the other potential confusing elements. For example, in Figure 1(B.1), the SOTA methods are not able to detect the object of interest given its size relative to image; and while the bird is the object of interest in Figure 1(B.2), the surrounding context dominated by the green leaves results in an incorrect association of the color attribute to the object. The poor performance of the SOTA methods can be attributed to the dominant confounding elements thereby impeding the right O-A composition prediction. This in turn is due to the bias towards seen O-A composition during training time. Generalization to more realistic cases as seen in Figure 1(B) is crucial for the widespread use of CZSL.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found