Goto

Collaborating Authors

 parikh


Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Medhini Narasimhan, Svetlana Lazebnik, Alexander Schwing

Neural Information Processing Systems

Accurately answering aquestionabout agivenimage requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains analgorithmic challenge. Toadvance research inthisdirection anovel'fact-based' visual question answering (FVQA) taskhas been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation.








Why mathematicians want to destroy infinity – and may succeed

New Scientist

How many atoms are there in the observable universe? Current estimates point to a number we would write as 1 followed by 80 zeroes, or 1080. If you peered inside each of these atoms and counted their subatomic particles, you could count a bit higher. But what happens beyond that? Take 1090 – even if you counted every atom and subatomic particle in the known universe, you wouldn't reach this number. In some sense, 1090 has no relation to physical reality.


MALTS: Matching After Learning to Stretch

Parikh, Harsh, Rudin, Cynthia, Volfovsky, Alexander

arXiv.org Artificial Intelligence

We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.


Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering

Kil, Jihyung, Zhang, Cheng, Xuan, Dong, Chao, Wei-Lun

arXiv.org Artificial Intelligence

Visual question answering (VQA) is challenging not only because the model has to handle multi-modal information, but also because it is just so hard to collect sufficient training examples -- there are too many questions one can ask about an image. As a result, a VQA model trained solely on human-annotated examples could easily over-fit specific question styles or image contents that are being asked, leaving the model largely ignorant about the sheer diversity of questions. Existing methods address this issue primarily by introducing an auxiliary task such as visual grounding, cycle consistency, or debiasing. In this paper, we take a drastically different approach. We found that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly. For instance, questions asking about the same object in different images are likely paraphrases; the number of detected or annotated objects in an image already provides the answer to the "how many" question, even if the question has not been annotated for that image. Building upon these insights, we present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA. We show that these augmented examples can notably improve the learned VQA models' performance, not only on the VQA-CP dataset with language prior shifts but also on the VQA v2 dataset without such shifts. Our method further opens up the door to leverage weakly-labeled or unlabeled images in a principled way to enhance VQA models. Our code and data are publicly available at https://github.com/heendung/simpleAUG.