NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing

Mar-21-2026, 12:42:04 GMT–Neural Information Processing Systems

Nanopore sequencing is the third-generation sequencing technology with capabilities of generating long-read sequences and directly measuring modifications on DNA/RNA molecules, which makes it ideal for biological applications such as human Telomere-to-Telomere (T2T) genome assembly, Ebola virus surveillance and COVID-19 mRNA vaccine development. However, accuracies of computational methods in various tasks of Nanopore sequencing data analysis are far from satisfactory. For instance, the base calling accuracy of Nanopore RNA sequencing is $\sim$90\%, while the aim is $\sim$99.9\%. This highlights an urgent need of contributions from the machine learning community. A bottleneck that prevents machine learning researchers from entering this field is the lack of a large integrated benchmark dataset.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Mar-21-2026, 12:42:04 GMT

Conferences Web Page

Add feedback

Industry:
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (1.00)
  - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)