BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems

Mehandru, Nikita, Hall, Amanda K., Melnichenko, Olesya, Dubinina, Yulia, Tsirulnikov, Daniel, Bamman, David, Alaa, Ahmed, Saponas, Scott, Malladi, Venkat S.

Jan-10-2025–arXiv.org Artificial Intelligence

Creating end-to-end bioinformatics workflows requires diverse domain expertise, which poses challenges for both junior and senior researchers as it demands a deep understanding of both genomics concepts and computational techniques. While large language models (LLMs) provide some assistance, they often fall short in providing the nuanced guidance needed to execute complex bioinformatics tasks, and require expensive computing resources to achieve high performance. We thus propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG). Our system, BioAgents, enables local operation and personalization using proprietary data. We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities. Large language models (LLMs) have been applied to various domain-specific contexts, including scientific discovery in medicine [45, 49, 56], chemistry [6, 7], and biotechnology [31]. Recent advances in LLMs have spurred their use in bioinformatics [13], a field encompassing data-intensive tasks such as genome sequencing, protein structure prediction, and pathway analysis. One of the most significant applications has been AlphaFold3, which uses transformer architecture with triangular attention to predict a protein's three-dimensional (3-D) structure from amino acid sequences [2].

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-10-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Agents (1.00)