Cohen, Raphael
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach
Grushka-Cohen, Hagit, Cohen, Raphael, Shapira, Bracha, Moran-Gilad, Jacob, Rokach, Lior
Testing is an important part of tackling the COVID-19 pandemic. Availability of testing is a bottleneck due to constrained resources and effective prioritization of individuals is necessary. Here, we discuss the impact of different prioritization policies on COVID-19 patient discovery and the ability of governments and health organizations to use the results for effective decision making. We suggest a framework for testing that balances the maximal discovery of positive individuals with the need for population-based surveillance aimed at understanding disease spread and characteristics. This framework draws from similar approaches to prioritization in the domain of cyber-security based on ranking individuals using a risk score and then reserving a portion of the capacity for random sampling. This approach is an application of Multi-Armed-Bandits maximizing exploration/exploitation of the underlying distribution. We find that individuals can be ranked for effective testing using a few simple features, and that ranking them using such models we can capture 65% (CI: 64.7%-68.3%) of the positive individuals using less than 20% of the testing capacity or 92.1% (CI: 91.1%-93.2%) of positives individuals using 70% of the capacity, allowing reserving a significant portion of the tests for population studies. Our approach allows experts and decision-makers to tailor the resulting policies as needed allowing transparency into the ranking policy and the ability to understand the disease spread in the population and react quickly and in an informed manner.
Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment
Baumel, Tal (Ben-Gurion University) | Nassour-Kassis, Jumana (Ben-Gurion University) | Cohen, Raphael (Chorus.ai) | Elhadad, Michael (Ben-Gurion University) | Elhadad, Nóemie (Columbia University)
The automatic coding of clinical documentation according to diagnosis codes is a useful task in the Electronic Health Record, but a challenging one due to the large number of codes and the length of patient notes. We investigate four models for assigning multiple ICD codes to discharge summaries, and experiment with data from the MIMIC II and III clinical datasets. We present Hierarchical Attention-bidirectional Gated Recurrent Unit (HA-GRU), a hierarchical approach to tag a document by identifying the sentences relevant for each label. HA-GRU achieves state-of-the art results. Furthermore, the learned sentence-level attention layer highlights the model decision process, allows for easier error analysis, and suggests future directions for improvement.
Topic Concentration in Query Focused Summarization Datasets
Baumel, Tal (Ben-Gurion University) | Cohen, Raphael (Ben-Gurion University) | Elhadad, Michael (Ben-Gurion University)
Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. We hypothesize this lack of success stems from the nature of the dataset. We define a task-based method to quantify topic concentration in datasets, i.e., the ratio of sentences within the dataset that are relevant to the query, and observe that the DUC 2005, 2006 and 2007 datasets suffer from very high topic concentration. We introduce TD-QFS, a new QFS dataset with controlled levels of topic concentration. We compare competitive baseline algorithms on TD-QFS and report strong improvement in ROUGE performance for algorithms that properly model query relevance as opposed to generic summarizers. We further present three new and simple QFS algorithms, RelSum, ThresholdSum, and TFIDF-KLSum that outperform state of the art QFS algorithms on the TD-QFS dataset by a large margin.