Tree Learning: Optimal Algorithms and Sample Complexity

Avdiukhin, Dmitrii, Yaroslavtsev, Grigory, Vainstein, Danny, Fischer, Orr, Das, Sauman, Mirza, Faraz

arXiv.org Artificial Intelligence 

The algorithmic problem of constructing hierarchical data representations has been of major importance for many decades, due to its applications in statistics [Ward Jr, 1963, Gower and Ross, 1969], entomology [Michener and Sokal, 1957], plant biology [Sorensen, 1948], genomics [Eisen et al., 1998] and other domains. Efficient collection of labeled data is a problem of key importance for construction of hierarchical data representations. In this paper we consider the problem of constructing tree representations of data from labeled samples, focusing on understanding the optimal number of samples required for this task. The most basic type of a label that allows one to construct a tree representation consists of a triplet of points (x, y, z) labeled according to the induced hierarchical structure within the triplet. For example, given images of a cat, a dog, and a plane, the label would describe the cat and the dog as being more similar to each other than to the plane. This "odd one out" type of label is the simplest one to collect in a crowdsourcing setting. In this paper, we focus on understanding the number of such labeled samples required in order to construct a tree representation of the underlying data, which enables one to accurately predict subsequent labels in the future.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found