Tree Learning: Optimal Algorithms and Sample Complexity

Avdiukhin, Dmitrii, Yaroslavtsev, Grigory, Vainstein, Danny, Fischer, Orr, Das, Sauman, Mirza, Faraz

Feb-9-2023–arXiv.org Artificial Intelligence

The algorithmic problem of constructing hierarchical data representations has been of major importance for many decades, due to its applications in statistics [Ward Jr, 1963, Gower and Ross, 1969], entomology [Michener and Sokal, 1957], plant biology [Sorensen, 1948], genomics [Eisen et al., 1998] and other domains. Efficient collection of labeled data is a problem of key importance for construction of hierarchical data representations. In this paper we consider the problem of constructing tree representations of data from labeled samples, focusing on understanding the optimal number of samples required for this task. The most basic type of a label that allows one to construct a tree representation consists of a triplet of points (x, y, z) labeled according to the induced hierarchical structure within the triplet. For example, given images of a cat, a dog, and a plane, the label would describe the cat and the dog as being more similar to each other than to the plane. This "odd one out" type of label is the simplest one to collect in a crowdsourcing setting. In this paper, we focus on understanding the number of such labeled samples required in order to construct a tree representation of the underlying data, which enables one to accurately predict subsequent labels in the future.

artificial intelligence, constraint, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Feb-9-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Asia (0.46)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found