Arusha Region
Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification
Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur
Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.
- North America > United States > New York (0.04)
- North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
- North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)
A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
We design an efficient algorithm that outputs tests for identifying predominantly homogeneous subcohorts of patients from large in-homogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Wiliamson (1995), that approximates the optimal solution within a factor of $0.82$. As an application, we use our algorithm to trade-off sensitivity for specificity to systematically identify clinically interesting homogeneous subcohorts of patients in the RNA microarray dataset for breast cancer from Curtis et al. (2012). One such clinically interesting subcohort suggests a link between LXR over-expression and BRCA2 and MSH6 methylation levels for patients in that subcohort.
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- Africa > Tanzania > Arusha Region > Arusha (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis
Brockschmidt, Marie, Schröder, Maresa, Feuerriegel, Stefan
Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff, an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii) preserves the censoring mechanism. Across multiple datasets, we show that \survdiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and downstream evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first diffusion model explicitly designed for generating synthetic survival data.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Africa > Tanzania > Arusha Region > Arusha (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization
Mondshine, Itai, Paz-Argaman, Tzuf, Tsarfaty, Reut
Automatic n-gram based metrics such as ROUGE are widely used for evaluating generative tasks such as summarization. While these metrics are considered indicative (even if imperfect) of human evaluation for English, their suitability for other languages remains unclear. To address this, we systematically assess evaluation metrics for generation both n-gram-based and neural based to evaluate their effectiveness across languages and tasks. Specifically, we design a large-scale evaluation suite across eight languages from four typological families: agglutinative, isolating, low-fusional, and high-fusional, spanning both low- and high-resource settings, to analyze their correlation with human judgments. Our findings highlight the sensitivity of evaluation metrics to the language type. For example, in fusional languages, n-gram-based metrics show lower correlation with human assessments compared to isolating and agglutinative languages. We also demonstrate that proper tokenization can significantly mitigate this issue for morphologically rich fusional languages, sometimes even reversing negative trends. Additionally, we show that neural-based metrics specifically trained for evaluation, such as COMET, consistently outperform other neural metrics and better correlate with human judgments in low-resource languages. Overall, our analysis highlights the limitations of n-gram metrics for fusional languages and advocates for greater investment in neural-based metrics trained for evaluation tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Israel (0.04)
- Europe > Ukraine (0.04)
- (13 more...)
The Reflexive Integrated Information Unit: A Differentiable Primitive for Artificial Consciousness
N'guessan, Gnankan Landry Regis, Karambal, Issa
Research on artificial consciousness lacks the equivalent of the perceptron: a small, trainable module that can be copied, benchmarked, and iteratively improved. We introduce the Reflexive Integrated Information Unit (RIIU), a recurrent cell that augments its hidden state $h$ with two additional vectors: (i) a meta-state $μ$ that records the cell's own causal footprint, and (ii) a broadcast buffer $B$ that exposes that footprint to the rest of the network. A sliding-window covariance and a differentiable Auto-$Φ$ surrogate let each RIIU maximize local information integration online. We prove that RIIUs (1) are end-to-end differentiable, (2) compose additively, and (3) perform $Φ$-monotone plasticity under gradient ascent. In an eight-way Grid-world, a four-layer RIIU agent restores $>90\%$ reward within 13 steps after actuator failure, twice as fast as a parameter-matched GRU, while maintaining a non-zero Auto-$Φ$ signal. By shrinking "consciousness-like" computation down to unit scale, RIIUs turn a philosophical debate into an empirical mathematical problem.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Africa > Tanzania > Arusha Region > Arusha (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation
Njifenjou, Ahmed, Sucal, Virgile, Jabaian, Bassam, Lefèvre, Fabrice
The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.
- North America > United States > California (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (29 more...)
- Personal (0.67)
- Research Report (0.63)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- (10 more...)
Optimized Quality of Service prediction in FSO Links over South Africa using Ensemble Learning
Adebusola, S. O., Owolawi, P. A., Ojo, J. S., Maswikaneng, P. S.
Fibre optic communication system is expected to increase exponentially in terms of application due to the numerous advantages over copper wires. The optical network evolution presents several advantages such as over long-distance, low-power requirement, higher carrying capacity and high bandwidth among others Such network bandwidth surpasses methods of transmission that include copper cables and microwaves. Despite these benefits, free-space optical communications are severely impacted by harsh weather situations like mist, precipitation, blizzard, fume, soil, and drizzle debris in the atmosphere, all of which have an impact on the Quality of Service (QoS) rendered by the systems. The primary goal of this article is to optimize the QoS using the ensemble learning models Random Forest, ADaBoost Regression, Stacking Regression, Gradient Boost Regression, and Multilayer Neural Network. To accomplish the stated goal, meteorological data, visibility, wind speed, and altitude were obtained from the South Africa Weather Services archive during a ten-year period (2010 to 2019) at four different locations: Polokwane, Kimberley, Bloemfontein, and George. We estimated the data rate, power received, fog-induced attenuation, bit error rate and power penalty using the collected and processed data. The RMSE and R-squared values of the model across all the study locations, Polokwane, Kimberley, Bloemfontein, and George, are 0.0073 and 0.9951, 0.0065 and 0.9998, 0.0060 and 0.9941, and 0.0032 and 0.9906, respectively. The result showed that using ensemble learning techniques in transmission modeling can significantly enhance service quality and meet customer service level agreements and ensemble method was successful in efficiently optimizing the signal to noise ratio, which in turn enhanced the QoS at the point of reception.
- Africa > South Africa > Limpopo > Polokwane (0.46)
- Africa > South Africa > Free State > Bloemfontein (0.46)
- Africa > South Africa > Gauteng > Pretoria (0.04)
- (5 more...)
- Government > Regional Government (0.34)
- Telecommunications > Networks (0.34)
Artificial Intelligence for Public Health Surveillance in Africa: Applications and Opportunities
Tshimula, Jean Marie, Kalengayi, Mitterrand, Makenga, Dieumerci, Lilonge, Dorcas, Asumani, Marius, Madiya, Déborah, Kalonji, Élie Nkuba, Kanda, Hugues, Galekwa, René Manassé, Kumbu, Josias, Mikese, Hardy, Tshimula, Grace, Muabila, Jean Tshibangu, Mayemba, Christian N., Nkashama, D'Jeff K., Kalala, Kalonji, Ataky, Steve, Basele, Tighana Wenge, Didier, Mbuyi Mukendi, Kasereka, Selain K., Dialufuma, Maximilien V., Kumwita, Godwill Ilunga Wa, Muyuku, Lionel, Kimpesa, Jean-Paul, Muteba, Dominique, Abedi, Aaron Aruna, Ntobo, Lambert Mukendi, Bundutidi, Gloria M., Mashinda, Désiré Kulimba, Mpinga, Emmanuel Kabengele, Kasoro, Nathanaël M.
Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.
- Africa > Middle East > Morocco (0.14)
- Africa > Middle East > Egypt (0.14)
- Europe > Middle East (0.14)
- (78 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (1.00)