Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework
Hossain, Elias, Shoeibi, Mehrdad, Garibay, Ivan, Yousefi, Niloofar
–arXiv.org Artificial Intelligence
While clustering methods such as spectral clustering and K-means effectively group genes by expression similarity, downstream interpretation is typically performed using enrichment-based statistics. These approaches provide high-level functional summaries but often fail to yield cluster-specific mechanistic insight or explicit links to supporting literature. As a result, biological interpretation frequently relies on manual curation, limiting reproducibility and scalability. Large language models (LLMs) have recently emerged as powerful tools for biomedical text mining and knowledge synthesis. Although LLMs can generate fluent biological narratives, they are optimized for linguistic coherence rather than evidential accountability. When applied directly to transcriptomic interpretation, they may produce plausible but unverifiable statements, omit explicit citations, or hallucinate unsupported claims. While retrieval-augmented and agentic systems partially address this issue, systematic verification and critic-based validation remain underexplored. This limitation is particularly consequential for antimicrobial resistance research in Salmonella enterica, a major foodborne pathogen responsible for substantial global morbidity.
arXiv.org Artificial Intelligence
Oct-21-2025
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology: