Reviews: Clustering Billions of Reads for DNA Data Storage
–Neural Information Processing Systems
The paper presents a solution to a new type of clustering problem that has emerged from studies of DNA-based storage. Information is encoded within DNA sequences and retrieved using short-read sequencing technology. The short-read sequencer will create multiple short overlapping sequence reads and these have to be clustered to establish whether they are from the same place in the original sequence. The characteristics of the clustering problem is that the clusters are pretty tight in terms of edit distance (25 max diameter here - that seems quite broad given current sequencing error rates) but well separated from each other (much larger distance between them than diameter). I thought this was an interesting and timely application.
Neural Information Processing Systems
Oct-8-2024, 06:23:05 GMT
- Technology: