Reviews: Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
–Neural Information Processing Systems
The main problem for me is that the paper promises a very real scenario (Figure 1) of how a user can refine search by using a sequence of refined queries. However, majority of the model design and evaluation (except section 4.2) is performed with dense region captions that have almost no sequential nature. While this is partially a strength as no additional labels are required, the method seems suited especially towards such disconnected queries -- there is space for M disconnected queries and only then updates are required. This would provide a deeper understanding of when the proposed method works better. In Figure 1, the user queries seem very natural, but the simulated queries in Figure 1 are not.
Neural Information Processing Systems
Jan-23-2025, 09:55:46 GMT
- Technology: