rna
RiboFlow: Conditional De Novo RNACo-Design via Synergistic Flow Matching
Ribonucleic acid (RNA) binds to molecules to achieve specific biological functions. While generative models are advancing biomolecule design, existing methods for designing RNA that target specific ligands face limitations in capturing RNA's conformational flexibility, ensuring structural validity, and overcoming data scarcity. To address these challenges, we introduce RiboFlow, a synergistic flow matching model to co-design RNA structures and sequences based on target molecules. By integrating RNA backbone frames, torsion angles, and sequence features in an unified architecture, RiboFlow explicitly models RNA's dynamic conformations while enforcing sequence-structure consistency to improve validity. Additionally, we curate RiboBind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. Extensive experiments reveal that RiboFlow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands.
RiboFlow: Conditional De Novo RNA Co-Design via Synergistic Flow Matching
Ribonucleic acid (RNA) binds to molecules to achieve specific biological functions. While generative models are advancing biomolecule design, existing methods for designing RNA that target specific ligands face limitations in capturing RNA's conformational flexibility, ensuring structural validity, and overcoming data scarcity. To address these challenges, we introduce RiboFlow, a synergistic flow matching model to co-design RNA structures and sequences based on target molecules. By integrating RNA backbone frames, torsion angles, and sequence features in an unified architecture, RiboFlow explicitly models RNA's dynamic conformations while enforcing sequence-structure consistency to improve validity. Additionally, we curate RiboBind, a large-scale dataset of RNA-molecule interactions, to resolve the scarcity of high-quality structural data. Extensive experiments reveal that RiboFlow not only outperforms state-of-the-art RNA design methods by a large margin but also showcases controllable capabilities for achieving high binding affinity to target ligands.
Multi-modal Transfer Learning between Biological Foundation Models
Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple \rna transcript isoforms originate from the same gene (i.e.
De-extinction of the woolly mammoth takes a MAJOR step forward: Scientists extract the RNA from a creature that lived 40,000 years ago - and it could allow them to resurrect the lost species
Autopsy reveals the truth about newlywed couple found dead in their car after wife's haunting final post Justin Baldoni's texts detail alleged showdown with Blake Lively's'angry husband' Ryan Reynolds King Charles'never understood' Meghan Markle but Queen Camilla saw through her'performance' - as royal expert reveals what really happened at Castle of Mey in 2018 Grim truth about'catastrophic' diarrhea incident at Gwyneth Paltrow's house: One year later, insiders dare to tell full REAL story that will'forever haunt' her Furious Trump orders Pam Bondi to investigate Bill Clinton over Epstein after exploding at'weak Republicans' Top fighter pilot breaks 45-year silence to reveal bombshell UFO encounter with '50ft triangular craft' at nuclear base I have new evidence Amy Bradley is alive: Bombshell by private investigator trying to solve Caribbean cruise disappearance. Now he reveals fatal flaws in Netflix documentary, what they DIDN'T show... and new twist Clint Eastwood's daughter Francesca reveals how she got back in shape so fast after welcoming second child last month Amy Schumer's marriage on the BRINK as star sheds pounds and sells off homes amid'difficult time' Why the truth about Hitler's genitals helps explain his'terrifying urge for domination' Epstein is taunting Trump from beyond the grave. His secret emails are a dark threat to the president. Here's why it could get even worse: JAMES REINL The hearing aid that's changed my life: I couldn't hear in crowded places, missed words and was humiliated by my old pair whistling, says LIZ JONES. Now experts told me about the new super-aids... Chick-fil-A to launch brand new menu item and customers are ecstatic: 'This is excellent news' Nutritionist influencer Diana Areas, 39, dies after'falling from top of building' GQ's Men of the Year 2025 awards WORST dressed stars, from Emma Chamberlain to Alix Earle The world's oldest RNA - an essential nucleic acid present in all living cells - has been extracted from the extinct woolly mammoth, a new study reveals.
Oldest known RNA found in 40,000-year-old woolly mammoth leg
Cave lions likely killed'Yuka' when she was around 8 years old. Breakthroughs, discoveries, and DIY tips sent every weekday. A 40,000-year-old juvenile woolly mammoth named Yuka is not only remarkable because she was uncovered nearly intact or her grisly cause of death. Her muscles provided paleogeneticists with the oldest known RNA sequences ever recovered. Detailed in a study published on November 14 in the journal, the samples contradict previous assumptions about the genetic material's resilience while furthering our understanding of the famous, extinct megafauna.
Multi-modal Transfer Learning between Biological Foundation Models
Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple \rna transcript isoforms originate from the same gene (i.e.
A Comparative Review of RNA Language Models
Wang, He, Zhang, Yikun, Chen, Jie, Zhan, Jian, Zhou, Yaoqi
Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or proteins or both) and compared 13 RNA LMs along with 3 DNA and 1 protein LMs as controls in zero-shot prediction of RNA secondary structure and functional classification. Results shows that the models doing well on secondary structure prediction often perform worse in function classification or vice versa, suggesting that more balanced unsupervised training is needed.
Large Language Models in Bioinformatics: A Survey
Wang, Zhenyu, Wang, Zikang, Jiang, Jiyue, Chen, Pengan, Shi, Xiangyu, Li, Yu
Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics. Meanwhile, we also discuss several key challenges, including data scarcity, computational complexity, and cross-omics integration, and explore future directions such as multimodal learning, hybrid AI models, and clinical applications. By offering a comprehensive perspective, this paper underscores the transformative potential of LLMs in driving innovations in bioinformatics and precision medicine.
Biological Sequence with Language Model Prompting: A Survey
Jiang, Jiyue, Wang, Zikang, Shan, Yuheng, Chai, Heyan, Li, Jiayi, Ma, Zixian, Zhang, Xinrui, Li, Yu
Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains. Notably, recent studies have demonstrated that large language models significantly enhance the efficiency of biomolecular analysis and synthesis, attracting widespread attention from academics and medicine. In this paper, we systematically investigate the application of prompt-based methods with LLMs to biological sequences, including DNA, RNA, proteins, and drug discovery tasks. Specifically, we focus on how prompt engineering enables LLMs to tackle domain-specific problems, such as promoter sequence prediction, protein structure modeling, and drug-target binding affinity prediction, often with limited labeled data. Furthermore, our discussion highlights the transformative potential of prompting in bioinformatics while addressing key challenges such as data scarcity, multimodal fusion, and computational resource limitations. Our aim is for this paper to function both as a foundational primer for newcomers and a catalyst for continued innovation within this dynamic field of study.