Question Answering
Supplementary Contents
A.1 Motivation For what purpose was the dataset created? As an affiliated dataset, we created MIMIC-CXR-VQA to provide a benchmark for medical visual question answering systems. Who created the dataset (e.g., which team, research group) and on behalf of which Who funded the creation of the dataset? This work was (partially) supported by Microsoft Research Asia, Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No.2019-0-00075, RS-2022-00155958), National Research Foundation of Korea (NRF) grant (NRF-2020H1D3A2A03100945), and the Korea Health Industry Development Institute (KHIDI) What do the instances that comprise the dataset represent (e.g., documents, photos, EHRXQA contains natural questions and corresponding SQL/NeuralSQL queries (text). How many instances are there in total (of each type, if appropriate)? In EHRXQA, there are about 46.2K instances (16,366 image-related samples, 16,529 table-related samples, and 13,257 image+table-related samples).
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic Tom and Jerry cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling \& joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.
Unified Language Model Pre-training for Natural Language Understanding and Generation
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.
Relaxed Marginal Consistency for Differentially Private Query Answering
Many differentially private algorithms for answering database queries involve astep that reconstructs a discrete data distribution from noisy measurements. Thisprovides consistent query answers and reduces error, but often requires space thatgrows exponentially with dimension. PRIVATE-PGM is a recent approach that usesgraphical models to represent the data distribution, with complexity proportional tothat of exact marginal inference in a graphical model with structure determined bythe co-occurrence of variables in the noisy measurements. PRIVATE-PGM is highlyscalable for sparse measurements, but may fail to run in high dimensions with densemeasurements. We overcome the main scalability limitation of PRIVATE-PGMthrough a principled approach that relaxes consistency constraints in the estimationobjective. Our new approach works with many existing private query answeringalgorithms and improves scalability or accuracy with no privacy cost.
An Uncertainty Principle is a Price of Privacy-Preserving Microdata
Privacy-protected microdata are often the desired output of a differentially private algorithm since microdata is familiar and convenient for downstream users. However, there is a statistical price for this kind of convenience. We show that an uncertainty principle governs the trade-off between accuracy for a population of interest (``sum query'') vs. accuracy for its component sub-populations (``point queries''). Compared to differentially private query answering systems that are not required to produce microdata, accuracy can degrade by a logarithmic factor. For example, in the case of pure differential privacy, without the microdata requirement, one can provide noisy answers to the sum query and all point queries while guaranteeing that each answer has squared error $O(1/\epsilon^2)$. With the microdata requirement, one must choose between allowing an additional $\log^2(d)$ factor ($d$ is the number of point queries) for some point queries or allowing an extra $O(d^2)$ factor for the sum query. We present lower bounds for pure, approximate, and concentrated differential privacy. We propose mitigation strategies and create a collection of benchmark datasets that can be used for public study of this problem.
EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process.In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward.In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and a question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory.When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal.Our experimental study using various BabyAI environments shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing the exploration.