parikh
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
Medhini Narasimhan, Svetlana Lazebnik, Alexander Schwing
Accurately answering aquestionabout agivenimage requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains analgorithmic challenge. Toadvance research inthisdirection anovel'fact-based' visual question answering (FVQA) taskhas been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation.
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Alberta (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > Canada (0.04)
- North America > United States > Illinois (0.04)
- North America > Canada > Alberta (0.04)
- Leisure & Entertainment (0.67)
- Education (0.46)
Why mathematicians want to destroy infinity – and may succeed
How many atoms are there in the observable universe? Current estimates point to a number we would write as 1 followed by 80 zeroes, or 1080. If you peered inside each of these atoms and counted their subatomic particles, you could count a bit higher. But what happens beyond that? Take 1090 – even if you counted every atom and subatomic particle in the known universe, you wouldn't reach this number. In some sense, 1090 has no relation to physical reality.
- Asia > Russia (0.68)
- North America > United States > New York (0.05)
- North America > United States > New Jersey (0.04)
- (4 more...)
- Government > Regional Government > Europe Government > Russia Government (0.47)
- Government > Regional Government > Asia Government > Russia Government (0.47)
MALTS: Matching After Learning to Stretch
Parikh, Harsh, Rudin, Cynthia, Volfovsky, Alexander
We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- South America > Chile (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- Health & Medicine (1.00)
- Education (0.67)
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Kil, Jihyung, Zhang, Cheng, Xuan, Dong, Chao, Wei-Lun
Visual question answering (VQA) is challenging not only because the model has to handle multi-modal information, but also because it is just so hard to collect sufficient training examples -- there are too many questions one can ask about an image. As a result, a VQA model trained solely on human-annotated examples could easily over-fit specific question styles or image contents that are being asked, leaving the model largely ignorant about the sheer diversity of questions. Existing methods address this issue primarily by introducing an auxiliary task such as visual grounding, cycle consistency, or debiasing. In this paper, we take a drastically different approach. We found that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly. For instance, questions asking about the same object in different images are likely paraphrases; the number of detected or annotated objects in an image already provides the answer to the "how many" question, even if the question has not been annotated for that image. Building upon these insights, we present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA. We show that these augmented examples can notably improve the learned VQA models' performance, not only on the VQA-CP dataset with language prior shifts but also on the VQA v2 dataset without such shifts. Our method further opens up the door to leverage weakly-labeled or unlabeled images in a principled way to enhance VQA models. Our code and data are publicly available at https://github.com/heendung/simpleAUG.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Asia > South Korea > Jeollanam-do > Muan (0.04)