health indicator
I-GLIDE: Input Groups for Latent Health Indicators in Degradation Estimation
Thil, Lucas, Read, Jesse, Kaddah, Rim, Doquet, Guillaume
Accurate remaining useful life (RUL) prediction hinges on the quality of health indicators (HIs), yet existing methods often fail to disentangle complex degradation mechanisms in multi-sensor systems or quantify uncertainty in HI reliability. This paper introduces a novel framework for HI construction, advancing three key contributions. First, we adapt Reconstruction along Projected Pathways (RaPP) as a health indicator (HI) for RUL prediction for the first time, showing that it outperforms traditional reconstruction error metrics. Second, we show that augmenting RaPP-derived HIs with aleatoric and epistemic uncertainty quantification (UQ)--via Monte Carlo dropout and probabilistic latent spaces-- significantly improves RUL-prediction robustness. Third, and most critically, we propose indicator groups, a paradigm that isolates sensor subsets to model system-specific degradations, giving rise to our novel method, I-GLIDE which enables interpretable, mechanism-specific diagnostics. Evaluated on data sourced from aerospace and manufacturing systems, our approach achieves marked improvements in accuracy and generalizability compared to state-of-the-art HI methods while providing actionable insights into system failure pathways.
Quantifying Climate Policy Action and Its Links to Development Outcomes: A Cross-National Data-Driven Analysis
Addressing climate change effectively requires more than cataloguing the number of policies in place; it calls for tools that can reveal their thematic priorities and their tangible impacts on development outcomes. Existing assessments often rely on qualitative descriptions or composite indices, which can mask crucial differences between key domains such as mitigation, adaptation, disaster risk management, and loss and damage. To bridge this gap, we develop a quantitative indicator of climate policy orientation by applying a multilingual transformer-based language model to official national policy documents, achieving a classification accuracy of 0.90 (F1-score). Linking these indicators with World Bank development data in panel regressions reveals that mitigation policies are associated with higher GDP and GNI; disaster risk management correlates with greater GNI and debt but reduced foreign direct investment; adaptation and loss and damage show limited measurable effects. This integrated NLP-econometric framework enables comparable, theme-specific analysis of climate governance, offering a scalable method to monitor progress, evaluate trade-offs, and align policy emphasis with development goals.
Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi
Metz, Lynn, Haggard, Rachel, Moszczynski, Michael, Asbah, Samer, Mwase, Chris, Khomani, Patricia, Smith, Tyler, Cooper, Hannah, Mwale, Annie, Muslim, Arbaaz, Prasad, Gautam, Sun, Mimi, Shekel, Tomer, Paul, Joydeep, Carter, Anna, Shetty, Shravya, Green, Dylan
The reliability of routine health data in low and middle-income countries (LMICs) is often constrained by reporting delays and incomplete coverage, necessitating the exploration of novel data sources and analytics. Geospatial Foundation Models (GeoFMs) offer a promising avenue by synthesizing diverse spatial, temporal, and behavioral data into mathematical embeddings that can be efficiently used for downstream prediction tasks. This study evaluated the predictive performance of three GeoFM embedding sources - Google Population Dynamics Foundation Model (PDFM), Google AlphaEarth (derived from satellite imagery), and mobile phone call detail records (CDR) - for modeling 15 routine health programmatic outputs in Malawi, and compared their utility to traditional geospatial interpolation methods. We used XGBoost models on data from 552 health catchment areas (January 2021-May 2023), assessing performance with R2, and using an 80/20 training and test data split with 5-fold cross-validation used in training. While predictive performance was mixed, the embedding-based approaches improved upon baseline geostatistical methods in 13 of 15 (87%) indicators tested. A Multi-GeoFM model integrating all three embedding sources produced the most robust predictions, achieving average 5-fold cross validated R2 values for indicators like population density (0.63), new HIV cases (0.57), and child vaccinations (0.47) and test set R2 of 0.64, 0.68, and 0.55, respectively. Prediction was poor for prediction targets with low primary data availability, such as TB and malnutrition cases. These results demonstrate that GeoFM embeddings imbue a modest predictive improvement for select health and demographic outcomes in an LMIC context. We conclude that the integration of multiple GeoFM sources is an efficient and valuable tool for supplementing and strengthening constrained routine health information systems.
Machine learning algorithms to predict stroke in China based on causal inference of time series analysis
Zheng, Qizhi, Zhao, Ayang, Wang, Xinzhu, Bai, Yanhong, Wang, Zikun, Wang, Xiuying, Zeng, Xianzhang, Dong, Guanghui
Participants: This study employed a combination of Vector Autoregression (VAR) model and Graph Neural Networks (GNN) to systematically construct dynamic causal inference. Multiple classic classification algorithms were compared, including Random Forest, Logistic Regression, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting, and Multi Layer Perceptron (MLP). The SMOTE algorithm was used to undersample a small number of samples and employed Stratified K-fold Cross Validation. Results: This study included a total of 11,789 participants, including 6,334 females (53.73%) and 5,455 males (46.27%), with an average age of 65 years. Introduction of dynamic causal inference features has significantly improved the performance of almost all models. The area under the ROC curve of each model ranged from 0.78 to 0.83, indicating significant difference (P < 0.01). Among all the models, the Gradient Boosting model demonstrated the highest performance and stability. Model explanation and feature importance analysis generated model interpretation that illustrated significant contributors associated with risks of stroke. Conclusions and Relevance: This study proposes a stroke risk prediction method that combines dynamic causal inference with machine learning models, significantly improving prediction accuracy and revealing key health factors that affect stroke. The research results indicate that dynamic causal inference features have important value in predicting stroke risk, especially in capturing the impact of changes in health status over time on stroke risk. By further optimizing the model and introducing more variables, this study provides theoretical basis and practical guidance for future stroke prevention and intervention strategies.
Classifier-Free Diffusion-Based Weakly-Supervised Approach for Health Indicator Derivation in Rotating Machines: Advancing Early Fault Detection and Condition Monitoring
Hu, Wenyang, Frusque, Gaetan, Wang, Tianyang, Chu, Fulei, Fink, Olga
Deriving health indicators of rotating machines is crucial for their maintenance. However, this process is challenging for the prevalent adopted intelligent methods since they may take the whole data distributions, not only introducing noise interference but also lacking the explainability. To address these issues, we propose a diffusion-based weakly-supervised approach for deriving health indicators of rotating machines, enabling early fault detection and continuous monitoring of condition evolution. This approach relies on a classifier-free diffusion model trained using healthy samples and a few anomalies. This model generates healthy samples. and by comparing the differences between the original samples and the generated ones in the envelope spectrum, we construct an anomaly map that clearly identifies faults. Health indicators are then derived, which can explain the fault types and mitigate noise interference. Comparative studies on two cases demonstrate that the proposed method offers superior health monitoring effectiveness and robustness compared to baseline models.
Recommender Algorithm for Supporting Self-Management of CVD Risk Factors in an Adult Population at Home
Afanasieva, Tatiana V., Platov, Pavel V., Medvedeva, Anastasia I.
One of the new trends in the development of recommendation algorithms is the dissemination of their capabilities to support the population in managing their health. This article focuses on the problem of improving the effectiveness of cardiovascular diseases (CVD) prevention, since CVD is the leading cause of death worldwide. To address this issue, a knowledge-based recommendation algorithm was proposed to support self-management of CVD risk factors in adults at home. The proposed algorithm is based on the original multidimensional recommendation model and on a new user profile model, which includes predictive assessments of CVD health in addition to its current ones as outlined in official guidelines. The main feature of the proposed algorithm is the combination of rule-based logic with the capabilities of a large language model in generating human-like text for explanatory component of multidimensional recommendation. The verification and evaluation of the proposed algorithm showed the usefulness of the proposed recommendation algorithm for supporting adults in self-management of their CVD risk factors at home. As follows from the comparison with similar knowledge-based recommendation algorithms, the proposed algorithm evaluates a larger number of CVD risk factors and has a greater information and semantic capacity of the generated recommendations.
Autonomous LLM-driven research from data to human-verifiable research papers
Ifargan, Tal, Hafner, Lukas, Kern, Maor, Alcalay, Ori, Kishony, Roy
As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.
Predicting Diabetes with Machine Learning Analysis of Income and Health Factors
Horestani, Fariba Jafari, O, M. Mehdi Owrang
In this study, we delve into the intricate relationships between diabetes and a range of health indicators, with a particular focus on the newly added variable of income. Utilizing data from the 2015 Behavioral Risk Factor Surveillance System (BRFSS), we analyze the impact of various factors such as blood pressure, cholesterol, BMI, smoking habits, and more on the prevalence of diabetes. Our comprehensive analysis not only investigates each factor in isolation but also explores their interdependencies and collective influence on diabetes. A novel aspect of our research is the examination of income as a determinant of diabetes risk, which to the best of our knowledge has been relatively underexplored in previous studies. We employ statistical and machine learning techniques to unravel the complex interplay between socio-economic status and diabetes, providing new insights into how financial well-being influences health outcomes. Our research reveals a discernible trend where lower income brackets are associated with a higher incidence of diabetes. In analyzing a blend of 33 variables, including health factors and lifestyle choices, we identified that features such as high blood pressure, high cholesterol, cholesterol checks, income, and Body Mass Index (BMI) are of considerable significance. These elements stand out among the myriad of factors examined, suggesting that they play a pivotal role in the prevalence and management of diabetes.
Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction
Wang, Junliang, Zhang, Qinghua, Zhu, Guanhua, Sun, Guoxi
Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics.
Learning Informative Health Indicators Through Unsupervised Contrastive Learning
Rombach, Katharina, Michau, Gabriel, Bürzle, Wilfried, Koller, Stefan, Fink, Olga
Condition monitoring is essential to operate industrial assets safely and efficiently. To achieve this goal, the development of robust health indicators has recently attracted significant attention. These indicators, which provide quantitative real-time insights into the health status of industrial assets over time, serve as valuable tools for fault detection and prognostics. In this study, we propose a novel and universal approach to learn health indicators based on unsupervised contrastive learning. Operational time acts as a proxy for the asset's degradation state, enabling the learning of a contrastive feature space that facilitates the construction of a health indicator by measuring the distance to the healthy condition. To highlight the universality of the proposed approach, we assess the proposed contrastive learning framework in two distinct tasks - wear assessment and fault detection - across two different case studies: a milling machines case study and a real condition monitoring case study of railway wheels from operating trains. First, we evaluate if the health indicator is able to learn the real health condition on a milling machine case study where the ground truth wear condition is continuously measured. Second, we apply the proposed method on a real case study of railway wheels where the ground truth health condition is not known. Here, we evaluate the suitability of the learned health indicator for fault detection of railway wheel defects. Our results demonstrate that the proposed approach is able to learn the ground truth health evolution of milling machines and the learned health indicator is suited for fault detection of railway wheels operated under various operating conditions by outperforming state-of-the-art methods. Further, we demonstrate that our proposed approach is universally applicable to different systems and different health conditions.