Plotting

Results


Benchmarking Distribution Shift in Tabular Data with TableShift Josh Gardner Ludwig Schmidt, University of Washington

Neural Information Processing Systems

Robustness to distribution shift has become a growing concern for text and image models as they transition from research subjects to deployment in the real world. However, high-quality benchmarks for distribution shift in tabular machine learning tasks are still lacking despite the widespread real-world use of tabular data and differences in the models used for tabular data in comparison to text and images. As a consequence, the robustness of tabular models to distribution shift is poorly understood.


Nteasee: A mixed methods study of expert and general population perspectives on deploying AI for health in African countries

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) for health has the potential to significantly change and improve healthcare. However in most African countries, identifying culturally and contextually attuned approaches for deploying these solutions is not well understood. To bridge this gap, we conduct a qualitative study to investigate the best practices, fairness indicators, and potential biases to mitigate when deploying AI for health in African countries, as well as explore opportunities where artificial intelligence could make a positive impact in health. We used a mixed methods approach combining in-depth interviews (IDIs) and surveys. We conduct 1.5-2 hour long IDIs with 50 experts in health, policy, and AI across 17 countries, and through an inductive approach we conduct a qualitative thematic analysis on expert IDI responses. We administer a blinded 30-minute survey with case studies to 672 general population participants across 5 countries in Africa and analyze responses on quantitative scales, statistically comparing responses by country, age, gender, and level of familiarity with AI. We thematically summarize open-ended responses from surveys. Our results find generally positive attitudes, high levels of trust, accompanied by moderate levels of concern among general population participants for AI usage for health in Africa. This contrasts with expert responses, where major themes revolved around trust/mistrust, ethical concerns, and systemic barriers to integration, among others. This work presents the first-of-its-kind qualitative research study of the potential of AI for health in Africa from an algorithmic fairness angle, with perspectives from both experts and the general population. We hope that this work guides policymakers and drives home the need for further research and the inclusion of general population perspectives in decision-making around AI usage.


A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry

arXiv.org Artificial Intelligence

Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in various medical applications, detailing their evaluation based on performance in tasks such as clinical diagnosis, medical text data processing, information retrieval, data analysis, and educational content generation. The subsequent sections offer a comprehensive discussion on the evaluation methods and metrics employed, including models, evaluators, and comparative experiments. We further examine the benchmarks and datasets utilized in these evaluations, providing a categorized description of benchmarks for tasks like question answering, summarization, information extraction, bioinformatics, information retrieval and general comprehensive benchmarks. This structure ensures a thorough understanding of how LLMs are assessed for their effectiveness, accuracy, usability, and ethical alignment in the medical domain. ...


Introduction to Algogens

arXiv.org Artificial Intelligence

This book introduces the concept of Algogens, a promising integration of generative AI with traditional algorithms aimed at improving problem-solving techniques across various fields. It provides an accessible overview of how Algogens combine AI's innovative potential with algorithms' reliability to tackle complex challenges more effectively than either could alone. The text explores the basics of Algogens, their development, applications, and advantages, such as better adaptability and efficiency. Through examples and case studies, readers will learn about Algogens' practical uses today and their potential for future cybersecurity, healthcare, and environmental science innovation. Acknowledging new technologies' challenges and ethical considerations, the book offers a balanced look at the prospects and obstacles facing Algogens. It invites a broad audience, including experts and newcomers, to engage with the topic and consider Algogens' role in advancing our problem-solving capabilities. This work is presented as a starting point for anyone interested in the intersection of AI and algorithms, encouraging further exploration and discussion on this emerging field. It aims to spark curiosity and contribute to the ongoing conversation about how technology can evolve to meet the complex demands of the AI era.


Benchmarking Distribution Shift in Tabular Data with TableShift

arXiv.org Artificial Intelligence

Robustness to distribution shift has become a growing concern for text and image models as they transition from research subjects to deployment in the real world. However, high-quality benchmarks for distribution shift in tabular machine learning tasks are still lacking despite the widespread real-world use of tabular data and differences in the models used for tabular data in comparison to text and images. As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. TableShift contains 15 binary classification tasks in total, each with an associated shift, and includes a diverse set of data sources, prediction targets, and distribution shifts. The benchmark covers domains including finance, education, public policy, healthcare, and civic participation, and is accessible using only a few lines of Python code via the TableShift API. We conduct a large-scale study comparing several state-of-the-art tabular data models alongside robust learning and domain generalization methods on the benchmark tasks. Our study demonstrates (1) a linear trend between in-distribution (ID) and out-of-distribution (OOD) accuracy; (2) domain robustness methods can reduce shift gaps but at the cost of reduced ID accuracy; (3) a strong relationship between shift gap (difference between ID and OOD performance) and shifts in the label distribution. The benchmark data, Python package, model implementations, and more information about TableShift are available at https://github.com/mlfoundations/tableshift and https://tableshift.org .


Machine Intelligence in Africa: a survey

arXiv.org Artificial Intelligence

In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.


Towards the Human Digital Twin: Definition and Design -- A survey

arXiv.org Artificial Intelligence

Digital Twins (DTs) are a critical technology for digitalizing physical entities in domains ranging from industry to city planning [1, 2]. DTs' ability to continuously adapt to a physical entity's state, simulate future events, and actively influence feedback and decision processes, goes significantly beyond traditional digital models as merely representations [3]. Thus, Industry 4.0 has started using DTs--along with other cutting-edge technologies, such as the Internet of Things (IoT), Big Data, and Artificial Intelligence (AI)--to significantly increase the efficiency and safety of both products and processes [3]. Further, due to DTs' real-time monitoring and simulation capabilities, they are being increasingly adapted to domains such as healthcare to meet demands for individualized diagnostics and treatment [4].


NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs

arXiv.org Artificial Intelligence

Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.


A Survey of Deep Causal Models and Their Industrial Applications

arXiv.org Machine Learning

The notion of causality assumes a paramount position within the realm of human cognition. Over the past few decades, there has been significant advancement in the domain of causal effect estimation across various disciplines, including but not limited to computer science, medicine, economics, and industrial applications. Given the continued advancements in deep learning methodologies, there has been a notable surge in its utilization for the estimation of causal effects using counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective functions to estimate counterfactual data unbiasedly. Different from the existing surveys on causal models in machine learning, this review mainly focuses on the overview of the deep causal models, and its core contributions are as follows: 1) we cast insight on a comprehensive overview of deep causal models from both timeline of development and method classification perspectives; 2) we outline some typical applications of causal effect estimation to industry; 3) we also endeavor to present a detailed categorization and analysis on relevant datasets, source codes and experiments.


Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework

arXiv.org Artificial Intelligence

AI-based medical technologies, including wearables, telemedicine, LLMs, and digital care twins, significantly impact healthcare. Ensuring AI results are accurate and interpretable is crucial, especially for clinicians. This paper reviews processes and challenges of interpretable ML (IML) and explainable AI (XAI) in healthcare. Objectives include reviewing XAI processes, methods, applications, and challenges, with a focus on quality control. The IML process is classified into data pre-processing interpretability, interpretable modeling, and post-processing interpretability. The paper aims to establish the importance of robust interpretability in healthcare through experimental results, providing insights for creating communicable clinician-AI tools. Research questions, eligibility criteria, and goals were identified following PRISMA and PICO methods. PubMed, Scopus, and Web of Science were systematically searched using specific strings. The survey introduces a step-by-step roadmap for implementing XAI in clinical applications, addressing existing gaps and acknowledging XAI model limitations.