AITopics | Translational Bioinformatics

China races to build record biobank to rival U.S. drugs research Biobanks store masses of biomedical data such as clinical records, genome sequences and other long-term health metrics that research and drug development depend on. As a fledgling researcher in U.S., Zhang Li was struck by the efficiency of extracting human tissue in the morning and mining it for data the same afternoon. Such a streamlined process had been missing from his years of training as a bio data scientist in China. Inspired, he returned home to Beijing to join the Chinese Institute for Brain Research and launch a national database that will collect blood and DNA samples from 33,000 children to help identify patterns of brain disease and their risk factors. "Biomedical data is extremely valuable and is fundamental for us to find solutions to diseases and to delay aging," said Zhang, surrounded by robotic arms carefully organizing blood samples.

artificial intelligence, bioinformatics, social media, (9 more...)

The Japan Times

Country:

Asia > Middle East > Iran (0.41)
Asia > China > Beijing > Beijing (0.27)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Communications > Social Media (0.78)
Information Technology > Artificial Intelligence > Robots (0.71)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.56)

Add feedback

PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics

Neural Information Processing SystemsApr-27-2026, 23:06:00 GMT

Proteomics is the interdisciplinary field focusing on the large-scale study of proteins. Proteins essentially organize and execute all functions within organisms. Today, the bottom-up analysis approach is the most commonly used workflow, where proteins are digested into peptides and subsequently analyzed using Tandem Mass Spectrometry (MS/MS). MS-based proteomics has transformed various fields in life sciences, such as drug discovery and biomarker identification. Today, proteomics is entering a phase where it is helpful for clinical decision-making. Computational methods are vital in turning large amounts of acquired raw MS data into information and, ultimately, knowledge.

bioinformatics, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > Germany (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenProteinSet: Training data for structural biology at scale

Neural Information Processing SystemsApr-24-2026, 20:49:30 GMT

Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.

artificial intelligence, bioinformatics, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Biconvex Biclustering

Rosen, Sam, Chi, Eric C., Xu, Jason

arXiv.org Machine LearningApr-13-2026

This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.

artificial intelligence, bioinformatics, machine learning, (18 more...)

arXiv.org Machine Learning

2604.03936

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Minnesota (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.66)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (0.86)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Multi-modal Transfer Learning between Biological Foundation Models

Neural Information Processing SystemsMar-21-2026, 14:23:02 GMT

Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great promise in solving certain biological tasks but current approaches are limited to a single sequence modality (DNA, RNA, or protein). Key problems in genomics intrinsically involve multiple modalities, but it remains unclear how to adapt general-purpose sequence models to those cases. In this work we propose a multi-modal model that connects DNA, RNA, and proteins by leveraging information from different pre-trained modality-specific encoders. We demonstrate its capabilities by applying it to the largely unsolved problem of predicting how multiple \rna transcript isoforms originate from the same gene (i.e.

bioinformatics, machine learning, natural language, (10 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.97)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.59)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.39)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

05a7ad45d75a3082d7a3a70de8743140-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 21:54:34 GMT

ec number, reaction, sequence, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Materials > Chemicals (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Appendix ProteinShake: Building datasets and benchmarks for deep learning on protein structures

Neural Information Processing SystemsFeb-16-2026, 17:10:21 GMT

Table 3: Comparison of models trained with different representations of protein structure across various tasks, on a random data split . The optimal choice of representation depends on the task. Shown are mean and standard deviation across four runs with different seeds. Table 4: Comparison of models trained with different representations of protein structure across various tasks, on a sequence data split . Table 5: Comparison of models trained with different representations of protein structure across various tasks, on a structure data split .

artificial intelligence, bioinformatics, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: