Cohen, David
K-QA: A Real-World Medical Q&A Benchmark
Manes, Itay, Ronn, Naama, Cohen, David, Ber, Ran Ilan, Horowitz-Kugler, Zehavi, Stanovsky, Gabriel
Ensuring the accuracy of responses provided by large language models (LLMs) is crucial, particularly in clinical settings where incorrect information may directly impact patient health. To address this challenge, we construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health (an AI-driven clinical platform). We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements. Additionally, we formulate two NLI-based evaluation metrics approximating recall and precision: (1) comprehensiveness, measuring the percentage of essential clinical information in the generated answer and (2) hallucination rate, measuring the number of statements from the physician-curated response contradicted by the LLM answer. Finally, we use K-QA along with these metrics to evaluate several state-of-the-art models, as well as the effect of in-context learning and medically-oriented augmented retrieval schemes developed by the authors. Our findings indicate that in-context learning improves the comprehensiveness of the models, and augmented retrieval is effective in reducing hallucinations. We make K-QA available to to the community to spur research into medically accurate NLP applications.
ManiFeSt: Manifold-based Feature Selection for Small Data Sets
Cohen, David, Shnitzer, Tal, Kluger, Yuval, Talmon, Ronen
In this paper, we present a new method for few-sample supervised feature selection (FS). Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations. Then, based on Riemannian geometry, a composite kernel is computed, extracting the differences between the learned feature associations. Finally, a FS score based on spectral analysis is proposed. Considering multi-feature associations makes our method multivariate by design. This in turn allows for the extraction of the hidden manifold underlying the features and avoids overfitting, facilitating few-sample FS. We showcase the efficacy of our method on illustrative examples and several benchmarks, where our method demonstrates higher accuracy in selecting the informative features compared to competing methods. In addition, we show that our FS leads to improved classification and better generalization when applied to test data.
On the Existence of Synchrostates in Multichannel EEG Signals during Face-perception Tasks
Jamal, Wasifa, Das, Saptarshi, Maharatna, Koushik, Apicella, Fabio, Chronaki, Georgia, Sicca, Federico, Cohen, David, Muratori, Filippo
Phase synchronisation in multichannel EEG is known as the manifestation of functional brain connectivity. Traditional phase synchronisation studies are mostly based on time average synchrony measures hence do not preserve the temporal evolution of the phase difference. Here we propose a new method to show the existence of a small set of unique phase synchronised patterns or "states" in multi-channel EEG recordings, each "state" being stable of the order of ms, from typical and pathological subjects during face perception tasks. The proposed methodology bridges the concepts of EEG microstates and phase synchronisation in time and frequency domain respectively. The analysis is reported for four groups of children including typical, Autism Spectrum Disorder (ASD), low and high anxiety subjects - a total of 44 subjects. In all cases, we observe consistent existence of these states - termed as synchrostates - within specific cognition related frequency bands (beta and gamma bands), though the topographies of these synchrostates differ for different subject groups with different pathological conditions. The inter-synchrostate switching follows a well-defined sequence capturing the underlying inter-electrode phase relation dynamics in stimulus- and person-centric manner. Our study is motivated from the well-known EEG microstate exhibiting stable potential maps over the scalp. However, here we report a similar observation of quasi-stable phase synchronised states in multichannel EEG. The existence of the synchrostates coupled with their unique switching sequence characteristics could be considered as a potentially new field over contemporary EEG phase synchronisation studies.
An Oral Exam for Measuring a Dialog System’s Capabilities
Cohen, David (Carnegie Mellon University) | Lane, Ian (Carnegie Mellon University)
This paper suggests a model and methodology for measuring the breadth and flexibility of a dialog system's capabilities. The approach relies on having human evaluators administer a targeted oral exam to a system and provide their subjective views of that system's performance on each test problem. We present results from one instantiation of this test being performed on two publicly-accessible dialog systems and a human, and show that the suggested metrics do provide useful insights into the relative strengths and weaknesses of these systems. Results suggest that this approach can be performed with reasonable reliability and with reasonable amounts of effort. We hope that authors will augment their reporting with this approach to improve clarity and make more direct progress toward broadly-capable dialog systems.