Supplementary Contents A Datasheet for Datasets 16 B Preliminary 20 B.1 Uni-modal data resources 20 B.2 Uni-modal EHR QA datasets 20 B.2.1 Table-based EHR QA

Neural Information Processing Systems 

A.1 Motivation For what purpose was the dataset created? We created EHRXQA to provide a valuable resource for advancing machine learning applications in multi-modal question answering systems on structured electronic health records (EHRs) and chest X-ray images. As an affiliated dataset, we created MIMIC-CXR-VQA to provide a benchmark for medical visual question answering systems. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)? Who funded the creation of the dataset? If there is an associated grant, please provide the name of the grantor and the grant name and number. A.2 Composition What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)? EHRXQA contains natural questions and corresponding SQL/NeuralSQL queries (text). MIMIC-CXR-VQA contains the image ID of the MIMIC-CXR dataset and their related natural questions. How many instances are there in total (of each type, if appropriate)? In MIMIC-CXR-VQA, there are about 377.4K instances. Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set? EHRXQA contains (Question, SQL/NeuralSQL, Answer) pair for each instance. MIMIC-CXR-VQA contains (Question, CXR image ID, Answer) pair for each instance. Is there a label or target associated with each instance? The answer (label) is provided for each question. Is any information missing from individual instances? If so, please provide a description, explaining why this information is missing (e.g., because it was unavailable). This does not include intentionally removed information, but might include, e.g., redacted text. No. Are relationships between individual instances made explicit (e.g., users' movie ratings, social network links)? No. Are there recommended data splits (e.g., training, development/validation, testing)? See Appendix B.2.2, and Appendix C.3.3. Questions are created by filling the slots in the templates with pre-defined values and records from the database. Thus, some questions can be grammatically incorrect but not critical (e.g., verb tense).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found