AITopics | bioinformatics

Collaborating Authors

bioinformatics

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA

Neural Information Processing SystemsMay-30-2025, 06:19:26 GMT

Foundation models have made significant strides in understanding the genomic language of DNA sequences. However, previous models typically adopt the tokenization methods designed for natural language, which are unsuitable for DNA sequences due to their unique characteristics. In addition, the optimal approach to tokenize DNA remains largely under-explored, and may not be intuitively understood by humans even if discovered. To address these challenges, we introduce MxDNA, a novel framework where the model autonomously learns an effective DNA tokenization strategy through gradient decent. MxDNA employs a sparse Mixture of Convolution Experts coupled with a deformable convolution to model the tokenization process, with the discontinuous, overlapping, and ambiguous nature of meaningful genomic segments explicitly considered. On Nucleotide Transformer Benchmarks and Genomic Benchmarks, MxDNA demonstrates superior performance to existing methods with less pretraining data and time, highlighting its effectiveness. Finally, we show that MxDNA learns unique tokenization strategy distinct to those of previous methods and captures genomic functionalities at a token level during self-supervised pretraining. Our MxDNA aims to provide a new perspective on DNA tokenization, potentially offering broad applications in various domains and yielding profound insights.

bioinformatics, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins

Neural Information Processing SystemsMay-29-2025, 13:01:26 GMT

The accurate identification of active sites in proteins is essential for the advancement of life sciences and pharmaceutical development, as these sites are of critical importance for enzyme activity and drug design. Recent advancements in protein language models (PLMs), trained on extensive datasets of amino acid sequences, have significantly improved our understanding of proteins. However, compared to the abundant protein sequence data, functional annotations, especially precise per-residue annotations, are scarce, which limits the performance of PLMs. On the other hand, textual descriptions of proteins, which could be annotated by human experts or a pretrained protein sequence-to-text model, provide meaningful context that could assist in the functional annotations, such as the localization of active sites. This motivates us to construct a ProTein-Attribute text Dataset (ProTAD), comprising over 570,000 pairs of protein sequences and multi-attribute textual descriptions.

bioinformatics, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Asia > China > Hubei Province (0.14)
North America > Canada > Quebec (0.14)
North America > United States > Louisiana (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(5 more...)

Add feedback

Co-evolution Transformer for Protein Contact Prediction

Neural Information Processing SystemsMay-29-2025, 05:32:01 GMT

Proteins are the main machinery of life and protein functions are largely determined by their 3D structures. The measurement of the pairwise proximity between amino acids of a protein, known as inter-residue contact map, well characterizes the structural information of a protein. Protein contact prediction (PCP) is an essential building block of many protein structure related applications. The prevalent approach to contact prediction is based on estimating the inter-residue contacts using hand-crafted coevolutionary features derived from multiple sequence alignments (MSAs). To mitigate the information loss caused by hand-crafted features, some recently proposed methods try to learn residue co-evolutions directly from MSAs.

artificial intelligence, bioinformatics, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.67)

Add feedback

Supplementary Materials of Feature Importance Ranking for Deep Learning

Neural Information Processing SystemsMay-28-2025, 23:58:53 GMT

Figure X: Accuracy and feature importance ranking (FIR) scores yielded by different methods on the Enhancer-Promoter dataset: GM12878 Cell line (200 bp). While DFS and RF yield only one result with all the 102 features, RFE and ours produce the results at different subset sizes for s = 15, 25, 35, 45, 55. Note that the results yielded RFE and ours for s = 35 above are not specified deliberately with the subset size to indicate that those have been reported in the main text.

artificial intelligence, bioinformatics, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

247e592848391fe01f153f179c595090-Paper-Conference.pdf

Neural Information Processing SystemsMay-28-2025, 22:31:42 GMT

artificial intelligence, bioinformatics, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

52aaa62e71f829d41d74892a18a11d59-Paper.pdf

Neural Information Processing SystemsMay-28-2025, 21:53:32 GMT

artificial intelligence, bioinformatics, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.69)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types

Neural Information Processing SystemsMay-28-2025, 19:18:00 GMT

However, associating small molecules with changes in cell state is challenging.

bioinformatics, cloud computing, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

A Supplementary materials

Neural Information Processing SystemsMay-28-2025, 10:17:45 GMT

A.1 Access instructions OpenProteinSet is hosted by the Registry of Open Data on AWS (RODA) and can be accessed at the following link: registry.opendata.aws/openfold/. A.2 Documentation and intended uses We include a datasheet [1] in Section B. Detailed documentation on the precise structure and content of the dataset is provided on the dataset's landing page. A.3 Data format All OpenProteinSet files are in standard plaintext formats (A3M for MSAs, HHSearch format for template hits, and PDB for structure files) that can be read by a wide variety of bioinformatics software. A.5 License OpenProteinSet is made available under the CC BY 4.0 license. A copy of the license is provided with the dataset.

artificial intelligence, bioinformatics, dataset, (15 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.72)
Law (0.47)

Technology:

Information Technology > Artificial Intelligence (0.70)
Information Technology > Biomedical Informatics (0.50)

Add feedback

OpenProteinSet: Training data for structural biology at scale

Neural Information Processing SystemsMay-28-2025, 10:17:41 GMT

Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.

artificial intelligence, bioinformatics, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Unified Insights: Harnessing Multi-modal Data for Phenotype Imputation via View Decoupling

Neural Information Processing SystemsMay-28-2025, 08:23:31 GMT

Phenotype imputation plays a crucial role in improving comprehensive and accurate medical evaluation, which in turn can optimize patient treatment and bolster the reliability of clinical research. Despite the adoption of various techniques, multi-modal biological data, which can provide crucial insights into a patient's overall health, is often overlooked. With multi-modal biological data, patient characterization can be enriched from two distinct views: the biological view and the phenotype view. However, the heterogeneity and imprecise nature of the multimodal data still pose challenges in developing an effective method to model from two views. In this paper, we propose a novel framework to incorporate multi-modal biological data via view decoupling.

bioinformatics, biological data, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.14)

Genre: