Reviews: Learning by Abstraction: The Neural State Machine
–Neural Information Processing Systems
As far as I can tell, the model is relatively simple and is mostly operating over and recomputing probability distributions of discrete elements in the image and tokens in the sentence. It's not a surprising next step in this area, but this approach is a good step in that direction. One concern is assumptions placed on the image content space by using a dataset like Visual Genome/GQA. Visual Genome uses a fixed ontology of properties and possible property values and (as the paper states in L129) ignores fine-grained statistics of the image (e.g., information about the background, like what color the sky is). Requiring this fixed ontology may work for a dataset like GQA, which is generated from such an ontology, but may be harder to extend to other, more realistic datasets where topics don't have to be limited to objects included in the gold scene graph.
Neural Information Processing Systems
Jan-26-2025, 21:35:15 GMT
- Technology: