health data
. Compared to the baseline γ = 0
Clearly, this does not provide a meaningful relaxation of the categorical constraint. We closely follow Fischer et al. [15]. With these variables, each term can be directly encoded as it consists of a linear function. In this section, we provide a detailed overview of the datasets considered in Section 6. Adult, German, Health, and Law School, have a highly skewed distribution of positive labels. Note, that the percentages do not sum to 100% as the labels are aggregated by patient and year.
- Education > Educational Setting > Higher Education (0.59)
- Education > Curriculum > Subject-Specific Education (0.59)
- North America > United States > Florida > Orange County > Orlando (0.14)
- Europe > Slovakia (0.04)
- Europe > Poland (0.04)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology (1.00)
- Leisure & Entertainment (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review
Nafis, Nazia, Esnaola, Inaki, Martinez-Perez, Alvaro, Villa-Uriol, Maria-Cruz, Osmani, Venet
Generating synthetic tabular data can be challenging, however evaluation of their quality is just as challenging, if not more. This systematic review sheds light on the critical importance of rigorous evaluation of synthetic health data to ensure reliability, relevance, and their appropriate use. Based on screening of 1766 papers and a detailed review of 101 papers we identified key challenges, including lack of consensus on evaluation methods, improper use of evaluation metrics, limited input from domain experts, inadequate reporting of dataset characteristics, and limited reproducibility of results. In response, we provide several guidelines on the generation and evaluation of synthetic data, to allow the community to unlock and fully harness the transformative potential of synthetic data and accelerate innovation.
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.05)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (9 more...)
A Structured Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy
Shafi, Abdullah Al, Zannat, Rowzatul, Muntakim, Abdul, Hasan, Mahmudul
Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value, indicating whether a symptom is associated with a disease. Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance
Exploring approaches to computational representation and classification of user-generated meal logs
Hu, Guanlan, Anand, Adit, Desai, Pooja M., Urteaga, Iñigo, Mamykina, Lena
This study examined the use of machine learning and domain specific enrichment on patient generated health data, in the form of free text meal logs, to classify meals on alignment with different nutritional goals. We used a dataset of over 3000 meal records collected by 114 individuals from a diverse, low income community in a major US city using a mobile app. Registered dietitians provided expert judgement for meal to goal alignment, used as gold standard for evaluation. Using text embeddings, including TFIDF and BERT, and domain specific enrichment information, including ontologies, ingredient parsers, and macronutrient contents as inputs, we evaluated the performance of logistic regression and multilayer perceptron classifiers using accuracy, precision, recall, and F1 score against the gold standard and self assessment. Even without enrichment, ML outperformed self assessments of individuals who logged meals, and the best performing combination of ML classifier with enrichment achieved even higher accuracies. In general, ML classifiers with enrichment of Parsed Ingredients, Food Entities, and Macronutrients information performed well across multiple nutritional goals, but there was variability in the impact of enrichment and classification algorithm on accuracy of classification for different nutritional goals. In conclusion, ML can utilize unstructured free text meal logs and reliably classify whether meals align with specific nutritional goals, exceeding self assessments, especially when incorporating nutrition domain knowledge. Our findings highlight the potential of ML analysis of patient generated health data to support patient centered nutrition guidance in precision healthcare.
- Asia > Bangladesh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Consumer Health (1.00)
- Education > Health & Safety > School Nutrition (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation
Nagesh, Nitish, Wang, Ziyu, Rahmani, Amir M.
Synthetic data generation creates data based on real-world data using generative models. In health applications, generating high-quality data while maintaining fairness for sensitive attributes is essential for equitable outcomes. Existing GAN-based and LLM-based methods focus on counterfactual fairness and are primarily applied in finance and legal domains. Causal fairness provides a more comprehensive evaluation framework by preserving causal structure, but current synthetic data generation methods do not address it in health settings. To fill this gap, we develop the first LLM-augmented synthetic data generation method to enhance causal fairness using real-world tabular health data. Our generated data deviates by less than 10% from real data on causal fairness metrics. When trained on causally fair predictors, synthetic data reduces bias on the sensitive attribute by 70% compared to real data. This work improves access to fair synthetic data, supporting equitable health research and healthcare delivery.
Concerns raised over AI trained on 57 million NHS medical records
An artificial intelligence model trained on the medical data of 57 million people who have used the National Health Service in England could one day assist doctors in predicting disease or forecast hospitalisation rates, its creators have claimed. However, other researchers say there are still significant privacy and data protection concerns around such large-scale use of health data, while even the AI's architects say they can't guarantee that it won't inadvertently reveal sensitive patient data. The model, called Foresight, was first developed in 2023. That initial version used OpenAI's GPT-3, the large language model (LLM) behind the first version of ChatGPT, and trained on 1.5 million real patient records from two London hospitals. Now, Chris Tomlinson at University College London and his colleagues have scaled up Foresight to create what they say is the world's first "national-scale generative AI model of health data" and the largest of its kind.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)
Privacy is All You Need: Revolutionizing Wearable Health Data with Advanced PETs
Barma, Karthik, Barma, Seshu Babu
In a world where data is the new currency, wearable health devices offer unprecedented insights into daily life, continuously monitoring vital signs and metrics. However, this convenience raises privacy concerns, as these devices collect sensitive data that can be misused or breached. Traditional measures often fail due to real-time data processing needs and limited device power. Users also lack awareness and control over data sharing and usage. We propose a Privacy-Enhancing Technology (PET) framework for wearable devices, integrating federated learning, lightweight cryptographic methods, and selectively deployed blockchain technology. The blockchain acts as a secure ledger triggered only upon data transfer requests, granting users real-time notifications and control. By dismantling data monopolies, this approach returns data sovereignty to individuals. Through real-world applications like secure medical data sharing, privacy-preserving fitness tracking, and continuous health monitoring, our framework reduces privacy risks by up to 70 percent while preserving data utility and performance. This innovation sets a new benchmark for wearable privacy and can scale to broader IoT ecosystems, including smart homes and industry. As data continues to shape our digital landscape, our research underscores the critical need to maintain privacy and user control at the forefront of technological progress.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Health & Medicine > Health Care Technology (1.00)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Hardware (1.00)
- Information Technology > Communications (1.00)
- (3 more...)
What does AI plan mean for NHS patient data and is there cause for concern?
Personal health data is by its nature highly sensitive and its vulnerability in a digital environment has already been underlined by recent ransomware attacks that have affected NHS trusts. Andrew Duncan, the director of foundational AI at the UK's Alan Turing Institute, says even anonymised health data can be manipulated to identify a patient through a process known as "re-identification" whereby "de-identified" data can be matched to other available information to identify someone. "Once you start to narrow things down you can start to re-identify people easily," he says. Duncan adds that AI models can be trained in a way that prevents re-identification, although "the caveat is that all of this has to be done very carefully". MedConfidential, which campaigns for confidentiality in healthcare, also wants clarity on whether a health dataset will respect patients who have signed an opt-out that prevents their data being used for research and planning in England.