NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing Lu Cheng Department of Computer Science, Aalto University, Finland
–Neural Information Processing Systems
Nanopore sequencing is the third-generation sequencing technology with capabilities of generating long-read sequences and directly measuring modifications on DNA/RNA molecules, which makes it ideal for biological applications such as human Telomere-to-Telomere (T2T) genome assembly, Ebola virus surveillance and COVID-19 mRNA vaccine development. However, accuracies of computational methods in various tasks of Nanopore sequencing data analysis are far from satisfactory. For instance, the base calling accuracy of Nanopore RNA sequencing is 90%, while the aim is 99.9%. This highlights an urgent need of contributions from the machine learning community. A bottleneck that prevents machine learning researchers from entering this field is the lack of a large integrated benchmark dataset.
Neural Information Processing Systems
Mar-23-2025, 16:14:48 GMT