log odds ratio
From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives
Fan, Shuxian, Visokay, Adam, Hoffman, Kentaro, Salerno, Stephen, Liu, Li, Leek, Jeffrey T., McCormick, Tyler H.
In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.
- North America > United States > Washington > King County > Seattle (0.14)
- Africa > Mozambique > Cabo Delgado Province > Pemba (0.07)
- Asia > India > Uttar Pradesh (0.05)
- (13 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Epidemiology (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Bayesian inference in spiking neurons
We propose a new interpretation of spiking neurons as Bayesian integra- tors accumulating evidence over time about events in the external world or the body, and communicating to other neurons their certainties about these events. In this model, spikes signal the occurrence of new infor- mation, i.e. what cannot be predicted from the past activity. As a result, firing statistics are close to Poisson, albeit providing a deterministic rep- resentation of probabilities. We proceed to develop a theory of Bayesian inference in spiking neural networks, recurrent interactions implement- ing a variant of belief propagation. Many perceptual and motor tasks performed by the central nervous system are probabilis- tic, and can be described in a Bayesian framework [4, 3].
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Don't PANIC: Prototypical Additive Neural Network for Interpretable Classification of Alzheimer's Disease
Wolf, Tom Nuno, Pölsterl, Sebastian, Wachinger, Christian
Alzheimer's disease (AD) has a complex and multifactorial etiology, which requires integrating information about neuroanatomy, genetics, and cerebrospinal fluid biomarkers for accurate diagnosis. Hence, recent deep learning approaches combined image and tabular information to improve diagnostic performance. However, the black-box nature of such neural networks is still a barrier for clinical applications, in which understanding the decision of a heterogeneous model is integral. We propose PANIC, a prototypical additive neural network for interpretable AD classification that integrates 3D image and tabular data. It is interpretable by design and, thus, avoids the need for post-hoc explanations that try to approximate the decision of a network. Our results demonstrate that PANIC achieves state-of-the-art performance in AD classification, while directly providing local and global explanations. Finally, we show that PANIC extracts biologically meaningful signatures of AD, and satisfies a set of desirable desiderata for trustworthy machine learning. Our implementation is available at https://github.com/ai-med/PANIC .
Analysis of Male and Female Speakers' Word Choices in Public Speeches
Hossain, Md Zobaer, Samin, Ahnaf Mozib
The extent to which men and women use language differently has been questioned previously. Finding clear and consistent gender differences in language is not conclusive in general, and the research is heavily influenced by the context and method employed to identify the difference. In addition, the majority of the research was conducted in written form, and the sample was collected in writing. Therefore, we compared the word choices of male and female presenters in public addresses such as TED lectures. The frequency of numerous types of words, such as parts of speech (POS), linguistic, psychological, and cognitive terms were analyzed statistically to determine how male and female speakers use words differently. Based on our data, we determined that male speakers use specific types of linguistic, psychological, cognitive, and social words in considerably greater frequency than female speakers.
- Europe > Netherlands (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Evaluation of Local Model-Agnostic Explanations Using Ground Truth
Rahnama, Amir Hossein Akhavan, Butepage, Judith, Geurts, Pierre, Bostrom, Henrik
Explanation techniques are commonly evaluated using human-grounded methods, limiting the possibilities for large-scale evaluations and rapid progress in the development of new techniques. We propose a functionally-grounded evaluation procedure for local model-agnostic explanation techniques. In our approach, we generate ground truth for explanations when the black-box model is Logistic Regression and Gaussian Naive Bayes and compare how similar each explanation is to the extracted ground truth. In our empirical study, explanations of Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Local Permutation Importance (LPI) are compared in terms of how similar they are to the extracted ground truth. In the case of Logistic Regression, we find that the performance of the explanation techniques is highly dependent on the normalization of the data. In contrast, Local Permutation Importance outperforms the other techniques on Naive Bayes, irrespective of normalization. We hope that this work lays the foundation for further research into functionally-grounded evaluation methods for explanation techniques.
- Information Technology (0.68)
- Health & Medicine > Therapeutic Area (0.33)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.59)
Covid-19 risk factors: Statistical learning from German healthcare claims data
Jucknewitz, Roland, Weidinger, Oliver, Schramm, Anja
We analyse prior risk factors for severe, critical or fatal courses of Covid-19 based on a retrospective cohort using claims data of the AOK Bayern. As our main methodological contribution, we avoid prior grouping and pre-selection of candidate risk factors. Instead, fine-grained hierarchical information from medical classification systems for diagnoses, pharmaceuticals and procedures are used, resulting in more than 33,000 covariates. Our approach has better predictive ability than well-specified morbidity groups but does not need prior subject-matter knowledge. The methodology and estimated coefficients are made available to decision makers to prioritize protective measures towards vulnerable subpopulations and to researchers who like to adjust for a large set of confounders in studies of individual risk factors.
- North America > United States (0.04)
- Europe > United Kingdom > Scotland (0.04)
- Europe > Germany > Bavaria > Regensburg (0.04)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Why Do So Many Practicing Data Scientists Not Understand Logistic Regression?
The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of "it will rain today." There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. This is coming from personal experience and from multiple contexts, but it seems that many data scientists simply do not understand logistic regression, or binomials and multinomials in general. The problem arises from logistic regression often being taught as a "classification" algorithm in the machine learning world.
- Research Report > New Finding (0.98)
- Research Report > Experimental Study (0.98)
Confidence-based Tuning of Nomogram Predictions
Mancill, Tony (Washington State University Vancouver) | Wallace, Scott A (Washington State University Vancouver)
Instance classification using machine learning techniques has numerous applications, from automation to medical diagnosis. In many problem domains, such as spam filtering, classification must be performed quickly across large datasets. In this paper we begin with machine learning techniques based on the naive Bayes classification and attempt to improve classification performance by taking into account attribute confidence intervals. Our prediction functions operate over nominal datasets and retain the asymptotic complexity of one-pass learning and prediction functions. We present preliminary results indicating a modest, albeit inconsistent improvement over the naive Bayes classifier alone.
- North America > United States > Washington > Clark County > Vancouver (0.15)
- South America > Paraguay > Asunción > Asunción (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)