Tree edit distance for hierarchical data compatible with HMIL paradigm

Šopík, Břetislav, Strenáčik, Tomáš

arXiv.org Artificial Intelligence 

In contemporary data analysis in industrial and academic research we often need to work with data that has a hierarchical structure. The analysis of such data is naturally more difficult than the analysis of data with a flat structure because the schema of the hierarchically organized dataset may possess important information which is lost if we ignore it. A common task of a dataset analysis is to evaluate the difference between two of its instances. If the dataset has a flat structure and consists of numerical vectors, appropriate distance function from vector spaces can be used. Similarly, we can utilize the Levenshtein edit distance[1] for comparison of two strings, etc.