Detection of Common Subtrees with Identical Label Distribution
Azaïs, Romain, Ingels, Florian
Tree data are ubiquitous, especially in biology and computer science, but also non-Euclidean [9], which prevents them from being analysed by classical statistical methods adapted to multidimensional data. Therefore, they require the development of specific tools that take into account their structured nature. Among such techniques, frequent pattern mining [1] consists in identifying patterns, i.e. substructures, that appear often in the data. The more elaborate the patterns searched, the more difficult the problem is: the issue is to preserve a reasonable algorithmic complexity that allows the search of a given family of patterns in a reasonable time. Different types of patterns have been considered in the literature to analyse tree data (see the survey [16] and the references therein) with a strong interest in a specific family of patterns called subtrees [3, 23]. In these two papers, only subtrees that appear more often than a given threshold are considered. Reverse search [5] is a generic approach for enumerating frequent patterns in a dataset that consists in (i) building an enumeration tree of substructures, and then (ii) pruning it to keep only frequent patterns.
Jul-24-2023
- Country:
- North America
- Canada > Alberta (0.14)
- United States > New York
- New York County > New York City (0.04)
- Europe
- Germany (0.04)
- France > Auvergne-Rhône-Alpes
- North America
- Genre:
- Research Report (0.50)
- Technology: