Goto

Collaborating Authors

 antibiotic resistance gene


Can Large Language Models Predict Antimicrobial Resistance Gene?

arXiv.org Artificial Intelligence

This study demonstrates that generative large language models can be utilized in a more flexible manner for DNA sequence analysis and classification tasks compared to traditional transformer encoder-based models. While recent encoder-based models such as DNABERT and Nucleotide Transformer have shown significant performance in DNA sequence classification, transformer decoder-based generative models have not yet been extensively explored in this field. This study evaluates how effectively generative Large Language Models handle DNA sequences with various labels and analyzes performance changes when additional textual information is provided. Experiments were conducted on antimicrobial resistance genes, and the results show that generative Large Language Models can offer comparable or potentially better predictions, demonstrating flexibility and accuracy when incorporating both sequence and textual information. The code and data used in this work are available at the following GitHub repository: https://github.com/biocomgit/llm4dna.


Predicting Anti-microbial Resistance using Large Language Models

arXiv.org Artificial Intelligence

During times of increasing antibiotic resistance and the spread of infectious diseases like COVID-19, it is important to classify genes related to antibiotic resistance. As natural language processing has advanced with transformer-based language models, many language models that learn characteristics of nucleotide sequences have also emerged. These models show good performance in classifying various features of nucleotide sequences. When classifying nucleotide sequences, not only the sequence itself, but also various background knowledge is utilized. In this study, we use not only a nucleotide sequence-based language model but also a text language model based on PubMed articles to reflect more biological background knowledge in the model. We propose a method to fine-tune the nucleotide sequence language model and the text language model based on various databases of antibiotic resistance genes. We also propose an LLM-based augmentation technique to supplement the data and an ensemble method to effectively combine the two models. We also propose a benchmark for evaluating the model. Our method achieved better performance than the nucleotide sequence language model in the drug resistance class prediction.


Machine Learning IDs Antibiotic-resistance Genes in TB-causing Bacteria

#artificialintelligence

Researchers at the University of California San Diego have developed an approach that uses machine learning to identify and predict which genes make infectious bacteria resistant to antibiotics. The approach was tested on strains of Mycobacterium tuberculosis--the bacteria that cause tuberculosis (TB) in humans. It identified 33 known and 24 new antibiotic resistance genes in these bacteria. The researchers say the approach can be used on other infection-causing pathogens, including staph and bacteria that cause urinary tract infections, pneumonia and meningitis. The work was recently published in Nature Communications.


Machine learning leveraging genomes from metagenomes identifies influential antibiotic resistance genes in the infant gut microbiome

#artificialintelligence

Antibiotic resistance in pathogens is extensively studied, yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leverage genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We find that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants compared to C. difficile lacking this gene.