A Limitations and future work We believe that the

Neural Information Processing Systems 

All real-world datasets analysed consist of sequence reads of the same part of the genome. This is a widespread set-up for sequence analysis but not ubiquitous. In this project, we work with edit distances between sequences, these are too expensive for large-scale analysis, but it is feasible to produce a large enough training set. We describe here the methods that are most closely related to our work. However, these are bound to a quadratic complexity w.r.t. the length of the input sequence, the best algorithm [ Experiments were also run on synthetic datasets formed by sequences randomly generated.