serum
ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction
Whitlock, Abigail E., Leroy, Gondy, Donovan, Fariba M., Galgiani, John N.
In medicine, machine learning (ML) datasets are often built using the International Classification of Diseases (ICD) codes. As new models are being developed, there is a need for larger datasets. However, ICD codes are intended for billing. We aim to determine how suitable ICD codes are for creating datasets to train ML models. We focused on a rare and common disease using the All of Us database. First, we compared the patient cohort created using ICD codes for Valley fever (coccidioidomycosis, CM) with that identified via serological confirmation. Second, we compared two similarly created patient cohorts for myocardial infarction (MI) patients. We identified significant discrepancies between these two groups, and the patient overlap was small. The CM cohort had 811 patients in the ICD-10 group, 619 patients in the positive-serology group, and 24 with both. The MI cohort had 14,875 patients in the ICD-10 group, 23,598 in the MI laboratory-confirmed group, and 6,531 in both. Demographics, rates of disease symptoms, and other clinical data varied across our case study cohorts.
Modeling Comparative Logical Relation with Contrastive Learning for Text Generation
Dan, Yuhao, Tian, Junfeng, Zhou, Jie, Yan, Ming, Zhang, Ji, Chen, Qin, He, Liang
Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is quite common in our daily life. In this paper, we introduce a new D2T task named comparative logical relation generation (CLRG). Additionally, we propose a Comparative Logic (CoLo) based text generation method, which generates texts following specific comparative logical relations with contrastive learning. Specifically, we first construct various positive and negative samples by fine-grained perturbations in entities, aspects and opinions. Then, we perform contrastive learning in the encoder layer to have a better understanding of the comparative logical relations, and integrate it in the decoder layer to guide the model to correctly generate the relations. Noting the data scarcity problem, we construct a Chinese Comparative Logical Relation Dataset (CLRD), which is a high-quality human-annotated dataset and challenging for text generation with descriptions of multiple entities and annotations on their comparative logical relations. Extensive experiments show that our method achieves impressive performance in both automatic and human evaluations.
General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History
Kim, Junu, Shim, Chaeeun, Yang, Bosco Seong Kyu, Im, Chami, Lim, Sung Yoon, Jeong, Han-Gil, Choi, Edward
Developing clinical prediction models (e.g., mortality prediction) based on electronic health records (EHRs) typically relies on expert opinion for feature selection and adjusting observation window size. This burdens experts and creates a bottleneck in the development process. We propose Retrieval-Enhanced Medical prediction model (REMed) to address such challenges. REMed can essentially evaluate an unlimited number of clinical events, select the relevant ones, and make predictions. This approach effectively eliminates the need for manual feature selection and enables an unrestricted observation window. We verified these properties through experiments on 27 clinical tasks and two independent cohorts from publicly available EHR datasets, where REMed outperformed other contemporary architectures that aim to handle as many events as possible. Notably, we found that the preferences of REMed align closely with those of medical experts. We expect our approach to significantly expedite the development of EHR prediction models by minimizing clinicians' need for manual involvement.
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Cao, Haoyu, Bao, Changcun, Liu, Chaohu, Chen, Huang, Yin, Kun, Liu, Hao, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive, SeRum converts document image understanding and recognition tasks into a local decoding process of the visual tokens of interest, using a content-aware token merge module. This mechanism enables the model to pay more attention to regions of interest generated by the query decoder, improving the model's effectiveness and speeding up the decoding speed of the generative scheme. We also designed several pre-training tasks to enhance the understanding and local awareness of the model. Experimental results demonstrate that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks. SeRum represents a substantial advancement towards enabling efficient and effective end-to-end document understanding.
Sound Explanation for Trustworthy Machine Learning
Jia, Kai, Saowakon, Pasapol, Appelbaum, Limor, Rinard, Martin
We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of attribution-based interpretation. We prove that no attribution algorithm satisfies specificity, additivity, completeness, and baseline invariance. We then formalize the concept, sound explanation, that has been informally adopted in prior work. A sound explanation entails providing sufficient information to causally explain the predictions made by a system. Finally, we present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.
EVOTER: Evolution of Transparent Explainable Rule-sets
Shahrzad, Hormoz, Hodjat, Babak, Miikkulainen, Risto
Most AI systems are black boxes generating reasonable outputs for given inputs. Some domains, however, have explainability and trustworthiness requirements that cannot be directly met by these approaches. Various methods have therefore been developed to interpret black-box models after training. This paper advocates an alternative approach where the models are transparent and explainable to begin with. This approach, EVOTER, evolves rule-sets based on simple logical expressions. The approach is evaluated in several prediction/classification and prescription/policy search domains with and without a surrogate. It is shown to discover meaningful rule sets that perform similarly to black-box models. The rules can provide insight into the domain, and make biases hidden in the data explicit. It may also be possible to edit them directly to remove biases and add constraints. EVOTER thus forms a promising foundation for building trustworthy AI systems for real-world applications in the future.
Citizen Sleeper review โ an evocative cyberpunk survival sim
If your brain were copied and placed in a robot body, would it have human rights? That's the thorny issue at the heart of Citizen Sleeper, a game set on a run-down space station called Erlin's Eye in the far-flung future. In this reality, AI is strictly controlled and artificial beings that achieve sentience are hunted down and destroyed, Blade Runner-style. But "emulated" humans known as sleepers offer a loophole, being neither fully artificial nor fully human. Nefarious megacorporations will pay desperate volunteers handsomely for the right to emulate their brain.
Wellness Watch
Best Buy is placing a heavy bet on the growing need for home-health tech services as it aims to reach 5 million seniors (up from 1 million today) in the next few years. The push into home monitoring will include technologies based on remote response and predictive health systems. Products might "include algorithm-driven pendants that track how a senior is walking and predict the risk of falling, refrigerators with sensors that measure whether an individual has been eating, and wireless scales that monitor patients with congestive heart failure." The retailer would also partner with insurance companies who want to preventively "monitor patients at home to avoid hospital stays." Atolla is an MIT-based start-up that is using artificial intelligence (AI) and machine-learning to create custom face skin-care creams for its customers.
AI Detects Brain Cancer from a Blood Test
Imagine being able to know the probability of whether a persistent headache that you are experiencing is a symptom of something much worse through a simple blood test. Researchers affiliated with ClinSpec Diagnostics Limited, a spin-off from the University of Strathclyde in Glasgow, Scotland, and their colleagues developed patented technology that can detect brain cancer from blood samples. Using an innovative combination of artificial intelligence (AI) and spectroscopy, the U.K. researchers developed a method to detect brain cancer from a blood biopsy, and published their study on October 8, 2019 in Nature Communications. Headaches are one of the most common symptoms of brain tumors, according to the American Brain Tumor Association. But while headaches are very common, brain cancer is not.
AI Detects Brain Cancer from a Blood Test
Imagine being able to know the probability of whether a persistent headache that you are experiencing is a symptom of something much worse through a simple blood test. Researchers affiliated with ClinSpec Diagnostics Limited, a spin-off from the University of Strathclyde in Glasgow, Scotland, and their colleagues developed patented technology that can detect brain cancer from a blood samples. Using an innovative combination of artificial intelligence (AI) and spectroscopy, the U.K. researchers developed a method to detect brain cancer from a blood biopsy, and published their study on October 8, 2019 in Nature Communications. Headaches are one of the most common symptoms of brain tumors, according to the American Brain Tumor Association. But while headaches are very common, brain cancer is not.