AITopics | Arusha Region

We design an efficient algorithm that outputs tests for identifying predominantly homogeneous subcohorts of patients from large in-homogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Wiliamson (1995), that approximates the optimal solution within a factor of $0.82$. As an application, we use our algorithm to trade-off sensitivity for specificity to systematically identify clinically interesting homogeneous subcohorts of patients in the RNA microarray dataset for breast cancer from Curtis et al. (2012). One such clinically interesting subcohort suggests a link between LXR over-expression and BRCA2 and MSH6 methylation levels for patients in that subcohort.

artificial intelligence, machine learning, subcohort, (15 more...)

arXiv.org Artificial Intelligence

2506.10879

Country:

North America > United States (0.04)
Asia > Middle East > Israel (0.04)
Africa > Tanzania > Arusha Region > Arusha (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification

Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur

arXiv.org Machine LearningNov-25-2025

Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.

copula, dependence, risk score, (15 more...)

arXiv.org Machine Learning

2511.17605

Country:

North America > United States > New York (0.04)
North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis

Brockschmidt, Marie, Schröder, Maresa, Feuerriegel, Stefan

arXiv.org Artificial IntelligenceSep-29-2025

Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff, an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii) preserves the censoring mechanism. Across multiple datasets, we show that \survdiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and downstream evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first diffusion model explicitly designed for generating synthetic survival data.

artificial intelligence, generating synthetic data, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2509.22352

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia (0.04)
Africa > Tanzania > Arusha Region > Arusha (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

eab69250e98b1f9fc54e473cc7a69439-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 16:19:17 GMT

artificial intelligence, feature selection, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Africa > Tanzania > Arusha Region > Arusha (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Beyond N-Grams: Rethinking Evaluation Metrics and Strategies for Multilingual Abstractive Summarization

Mondshine, Itai, Paz-Argaman, Tzuf, Tsarfaty, Reut

arXiv.org Artificial IntelligenceJul-14-2025

Automatic n-gram based metrics such as ROUGE are widely used for evaluating generative tasks such as summarization. While these metrics are considered indicative (even if imperfect) of human evaluation for English, their suitability for other languages remains unclear. To address this, we systematically assess evaluation metrics for generation both n-gram-based and neural based to evaluate their effectiveness across languages and tasks. Specifically, we design a large-scale evaluation suite across eight languages from four typological families: agglutinative, isolating, low-fusional, and high-fusional, spanning both low- and high-resource settings, to analyze their correlation with human judgments. Our findings highlight the sensitivity of evaluation metrics to the language type. For example, in fusional languages, n-gram-based metrics show lower correlation with human assessments compared to isolating and agglutinative languages. We also demonstrate that proper tokenization can significantly mitigate this issue for morphologically rich fusional languages, sometimes even reversing negative trends. Additionally, we show that neural-based metrics specifically trained for evaluation, such as COMET, consistently outperform other neural metrics and better correlate with human judgments in low-resource languages. Overall, our analysis highlights the limitations of n-gram metrics for fusional languages and advocates for greater investment in neural-based metrics trained for evaluation tasks.

correlation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.08342

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Israel (0.04)
Europe > Ukraine (0.04)
(13 more...)

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

The Reflexive Integrated Information Unit: A Differentiable Primitive for Artificial Consciousness

N'guessan, Gnankan Landry Regis, Karambal, Issa

arXiv.org Artificial IntelligenceJun-18-2025

Research on artificial consciousness lacks the equivalent of the perceptron: a small, trainable module that can be copied, benchmarked, and iteratively improved. We introduce the Reflexive Integrated Information Unit (RIIU), a recurrent cell that augments its hidden state $h$ with two additional vectors: (i) a meta-state $μ$ that records the cell's own causal footprint, and (ii) a broadcast buffer $B$ that exposes that footprint to the rest of the network. A sliding-window covariance and a differentiable Auto-$Φ$ surrogate let each RIIU maximize local information integration online. We prove that RIIUs (1) are end-to-end differentiable, (2) compose additively, and (3) perform $Φ$-monotone plasticity under gradient ascent. In an eight-way Grid-world, a four-layer RIIU agent restores $>90\%$ reward within 13 steps after actuator failure, twice as fast as a parameter-matched GRU, while maintaining a non-zero Auto-$Φ$ signal. By shrinking "consciousness-like" computation down to unit scale, RIIUs turn a philosophical debate into an empirical mathematical problem.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.13825

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Africa > Tanzania > Arusha Region > Arusha (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Open-Source Large Language Models as Multilingual Crowdworkers: Synthesizing Open-Domain Dialogues in Several Languages With No Examples in Targets and No Machine Translation

Njifenjou, Ahmed, Sucal, Virgile, Jabaian, Bassam, Lefèvre, Fabrice

arXiv.org Artificial IntelligenceMar-5-2025

The prevailing paradigm in the domain of Open-Domain Dialogue agents predominantly focuses on the English language, encompassing both models and datasets. Furthermore, the financial and temporal investments required for crowdsourcing such datasets for finetuning are substantial, particularly when multiple languages are involved. Fortunately, advancements in Large Language Models (LLMs) have unveiled a plethora of possibilities across diverse tasks. Specifically, instruction-tuning has enabled LLMs to execute tasks based on natural language instructions, occasionally surpassing the performance of human crowdworkers. Additionally, these models possess the capability to function in various languages within a single thread. Consequently, to generate new samples in different languages, we propose leveraging these capabilities to replicate the data collection process. We introduce a pipeline for generating Open-Domain Dialogue data in multiple Target Languages using LLMs, with demonstrations provided in a unique Source Language. By eschewing explicit Machine Translation in this approach, we enhance the adherence to language-specific nuances. We apply this methodology to the PersonaChat dataset. To enhance the openness of generated dialogues and mimic real life scenarii, we added the notion of speech events corresponding to the type of conversation the speakers are involved in and also that of common ground which represents the premises of a conversation.

large language model, machine learning, persona, (15 more...)

arXiv.org Artificial Intelligence

2503.03462

Country:

North America > United States > California (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada > Ontario > Toronto (0.04)
(29 more...)

Genre:

Personal (0.67)
Research Report (0.63)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Optimized Quality of Service prediction in FSO Links over South Africa using Ensemble Learning

Adebusola, S. O., Owolawi, P. A., Ojo, J. S., Maswikaneng, P. S.

arXiv.org Machine LearningNov-11-2024

Fibre optic communication system is expected to increase exponentially in terms of application due to the numerous advantages over copper wires. The optical network evolution presents several advantages such as over long-distance, low-power requirement, higher carrying capacity and high bandwidth among others Such network bandwidth surpasses methods of transmission that include copper cables and microwaves. Despite these benefits, free-space optical communications are severely impacted by harsh weather situations like mist, precipitation, blizzard, fume, soil, and drizzle debris in the atmosphere, all of which have an impact on the Quality of Service (QoS) rendered by the systems. The primary goal of this article is to optimize the QoS using the ensemble learning models Random Forest, ADaBoost Regression, Stacking Regression, Gradient Boost Regression, and Multilayer Neural Network. To accomplish the stated goal, meteorological data, visibility, wind speed, and altitude were obtained from the South Africa Weather Services archive during a ten-year period (2010 to 2019) at four different locations: Polokwane, Kimberley, Bloemfontein, and George. We estimated the data rate, power received, fog-induced attenuation, bit error rate and power penalty using the collected and processed data. The RMSE and R-squared values of the model across all the study locations, Polokwane, Kimberley, Bloemfontein, and George, are 0.0073 and 0.9951, 0.0065 and 0.9998, 0.0060 and 0.9941, and 0.0032 and 0.9906, respectively. The result showed that using ensemble learning techniques in transmission modeling can significantly enhance service quality and meet customer service level agreements and ensemble method was successful in efficiently optimizing the signal to noise ratio, which in turn enhanced the QoS at the point of reception.

artificial intelligence, machine learning, wavelength, (16 more...)

arXiv.org Machine Learning

2411.06832

Country:

Africa > South Africa > Limpopo > Polokwane (0.46)
Africa > South Africa > Free State > Bloemfontein (0.46)
Africa > South Africa > Gauteng > Pretoria (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.34)
Telecommunications > Networks (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Artificial Intelligence for Public Health Surveillance in Africa: Applications and Opportunities

Tshimula, Jean Marie, Kalengayi, Mitterrand, Makenga, Dieumerci, Lilonge, Dorcas, Asumani, Marius, Madiya, Déborah, Kalonji, Élie Nkuba, Kanda, Hugues, Galekwa, René Manassé, Kumbu, Josias, Mikese, Hardy, Tshimula, Grace, Muabila, Jean Tshibangu, Mayemba, Christian N., Nkashama, D'Jeff K., Kalala, Kalonji, Ataky, Steve, Basele, Tighana Wenge, Didier, Mbuyi Mukendi, Kasereka, Selain K., Dialufuma, Maximilien V., Kumwita, Godwill Ilunga Wa, Muyuku, Lionel, Kimpesa, Jean-Paul, Muteba, Dominique, Abedi, Aaron Aruna, Ntobo, Lambert Mukendi, Bundutidi, Gloria M., Mashinda, Désiré Kulimba, Mpinga, Emmanuel Kabengele, Kasoro, Nathanaël M.

arXiv.org Artificial IntelligenceAug-5-2024

Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.

africa, outbreak, prediction, (15 more...)

arXiv.org Artificial Intelligence

2408.02575

Country:

Africa > Sub-Saharan Africa (0.05)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Africa > West Africa (0.05)
(78 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(7 more...)

Add feedback