AITopics | causal gene

Collaborating Authors

causal gene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LA-MARRVEL: A Knowledge-Grounded and Language-Aware LLM Reranker for AI-MARRVEL in Rare Disease Diagnosis

Lee, Jaeyeon, Jeong, Hyun-Hwan, Liu, Zhandong

arXiv.org Artificial IntelligenceNov-7-2025

Diagnosing rare diseases requires linking gene findings with often unstructured reference text. Current pipelines collect many candidate genes, but clinicians still spend a lot of time filtering false positives and combining evidence from papers and databases. A key challenge is language: phenotype descriptions and inheritance patterns are written in prose, not fully captured by tables. Large language models (LLMs) can read such text, but clinical use needs grounding in citable knowledge and stable, repeatable behavior. We explore a knowledge-grounded and language-aware reranking layer on top of a high-recall first-stage pipeline. The goal is to improve precision and explainability, not to replace standard bioinformatics steps. We use expert-built context and a consensus method to reduce LLM variability, producing shorter, better-justified gene lists for expert review. LA-MARRVEL achieves the highest accuracy, outperforming other methods -- including traditional bioinformatics diagnostic tools (AI-MARRVEL, Exomiser, LIRICAL) and naive large language models (e.g., Anthropic Claude) -- with an average Recall@5 of 94.10%, a +3.65 percentage-point improvement over AI-MARRVEL. The LLM-generated reasoning provides clear prose on phenotype matching and inheritance patterns, making clinical review faster and easier. LA-MARRVEL has three parts: expert-engineered context that enriches phenotype and disease information; a ranked voting algorithm that combines multiple LLM runs to choose a consensus ranked gene list; and the AI-MARRVEL pipeline that provides first-stage ranks and gene annotations, already known as a state-of-the-art method in Rare Disease Diagnosis on BG, DDD, and UDN cohorts. The online AI-MARRVEL includes LA-MARRVEL as an LLM feature at https://ai.marrvel.org . We evaluate LA-MARRVEL on three datasets from independent cohorts of real-world diagnosed patients.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.02263

Country: North America > United States > Texas (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Knowledge Graph Sparsification for GNN-based Rare Disease Diagnosis

Cara, Premt, Zaripova, Kamilia, Bani-Harouni, David, Navab, Nassir, Farshad, Azade

arXiv.org Artificial IntelligenceOct-13-2025

Rare genetic disease diagnosis faces critical challenges: insufficient patient data, inaccessible full genome sequencing, and the immense number of possible causative genes. These limitations cause prolonged diagnostic journeys, inappropriate treatments, and critical delays, disproportionately affecting patients in resource-limited settings where diagnostic tools are scarce. We propose RareNet, a subgraph-based Graph Neural Network that requires only patient phenotypes to identify the most likely causal gene and retrieve focused patient subgraphs for targeted clinical investigation. RareNet can function as a standalone method or serve as a pre-processing or post-processing filter for other candidate gene prioritization methods, consistently enhancing their performance while potentially enabling explainable insights. Through comprehensive evaluation on two biomedical datasets, we demonstrate competitive and robust causal gene prediction and significant performance gains when integrated with other frameworks. By requiring only phenotypic data, which is readily available in any clinical setting, RareNet democratizes access to sophisticated genetic analysis, offering particular value for underserved populations lacking advanced genomic infrastructure.

causal gene, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.08655

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
Health & Medicine > Therapeutic Area > Genetic Disease (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Survey and Improvement Strategies for Gene Prioritization with Large Language Models

Neeley, Matthew, Qi, Guantong, Wang, Guanchu, Tang, Ruixiang, Mao, Dongxue, Liu, Chaozhong, Pasupuleti, Sasidhar, Yuan, Bo, Xia, Fan, Liu, Pengfei, Liu, Zhandong, Hu, Xia

arXiv.org Artificial IntelligenceJan-30-2025

Rare diseases are challenging to diagnose due to limited patient data and genetic diversity. Despite advances in variant prioritization, many cases remain undiagnosed. While large language models (LLMs) have performed well in medical exams, their effectiveness in diagnosing rare genetic diseases has not been assessed. To identify causal genes, we benchmarked various LLMs for gene prioritization. Using multi-agent and Human Phenotype Ontology (HPO) classification, we categorized patients based on phenotypes and solvability levels. As gene set size increased, LLM performance deteriorated, so we used a divide-and-conquer strategy to break the task into smaller subsets. At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly. The multi-agent and HPO approaches helped distinguish confidently solved cases from challenging ones, highlighting the importance of known gene-phenotype associations and phenotype specificity. We found that cases with specific phenotypes or clear associations were more accurately solved. However, we observed biases toward well-studied genes and input order sensitivity, which hindered gene prioritization. Our divide-and-conquer strategy improved accuracy by overcoming these biases. By utilizing HPO classification, novel multi-agent techniques, and our LLM strategy, we improved causal gene identification accuracy compared to our baseline evaluation. This approach streamlines rare disease diagnosis, facilitates reanalysis of unsolved cases, and accelerates gene discovery, supporting the development of targeted diagnostics and therapies.

causal gene, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.18794

Country:

North America > United States > Texas (0.04)
North America > United States > New Jersey (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Handling high correlations in the feature gene selection using Single-Cell RNA sequencing data

Xing, Li, Joun, Songwan, Mackey, Kurt, Lesperance, Mary, Zhang, Xuekui

arXiv.org Machine LearningJul-30-2020

Motivation: Selecting feature genes and predicting cells' phenotype are typical tasks in the analysis of scRNA-seq data. Many algorithms were developed for these tasks, but high correlations among genes create challenges specifically in scRNA-seq analysis, which are not well addressed. Highly correlated genes lead to unreliable prediction models due to technical problems, such as multi-collinearity. Most importantly, when a causal gene (whose variants have a true biological effect on the phenotype) is highly correlated with other genes, most algorithms select one of them in a data-driven manner. The correlation structure among genes could change substantially. Hence, it is critical to build a prediction model based on causal genes. Results: To address the issues discussed above, we propose a grouping algorithm that can be integrated into prediction models. Using real benchmark scRNA-seq data and simulated cell phenotypes, we show our novel method significantly outperforms standard models in both prediction and feature selection. Our algorithm reports the whole group of correlated genes, allowing researchers to either use their common pattern as a more robust predictor or conduct follow-up studies to identify the causal genes in the group. Availability: An R package is being developed and will be available on the Comprehensive R Archive Network (CRAN) when the paper is published.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2007.02455

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Saskatchewan > Saskatoon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Causal Mediation Analysis Leveraging Multiple Types of Summary Statistics Data

Park, Yongjin, Sarkar, Abhishek, Nguyen, Khoi, Kellis, Manolis

arXiv.org Machine LearningJan-24-2019

Summary statistics of genome-wide association studies (GWAS) teach causal relationship between millions of genetic markers and tens and thousands of phenotypes. However, underlying biological mechanisms are yet to be elucidated. We can achieve necessary interpretation of GWAS in a causal mediation framework, looking to establish a sparse set of mediators between genetic and downstream variables, but there are several challenges. Unlike existing methods rely on strong and unrealistic assumptions, we tackle practical challenges within a principled summary-based causal inference framework. We analyzed the proposed methods in extensive simulations generated from real-world genetic data. We demonstrated only our approach can accurately redeem causal genes, even without knowing actual individual-level data, despite the presence of competing non-causal trails.

causal gene, phenotype, statistics, (16 more...)

arXiv.org Machine Learning

1901.0854

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Biomedical Informatics (0.68)

Add feedback

ProDiGe: PRioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

Mordelet, Fantine, Vert, Jean-Philippe

arXiv.org Machine LearningJun-1-2011

Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Here we propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Machine Learning

1106.0134

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback