approximation bound
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta.
Reviews: Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
The paper extends the work of Dasgupta towards defining a theoretical framework for evaluating hierarchical clustering. The paper defines an objective function that is equivalent to that of Dasgupta in the sense that an optimal solution for the cost function defined by Dasgupta will also be optimal for the one defined in this paper and vice versa. The behaviour of approximate solution will differ though because the cost function is of the form D(G) - C(T), where D is a constant (depending on the input G) and C(T) is the cost (defined by Dasgupta) and T is the hierarchical solution. The authors show a constant approximation guarantee w.r.t. They complement this approximation guarantee by showing that this approximation guarantee is almost tight.
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Moseley, Benjamin, Wang, Joshua
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main result is that one of the most popular algorithms used in practice, average linkage agglomerative clustering, has a small constant approximation ratio for this objective.
Revisiting the Approximation Bound for Stochastic Submodular Cover
Hellerstein, Lisa, Kletenik, Devorah
Deshpande et al. presented a k(ln R + 1) approximation bound for Stochastic Submodular Cover, where k is the state set size, R is the maximum utility of a single item, and the utility function is integer-valued. This bound is similar to the ln Q/(eta+1) bound given by Golovin and Krause, whose analysis was recently found to have an error. Here Q >= R is the goal utility and eta is the minimum gap between Q and any attainable utility Q' < Q. We revisit the proof of the k(ln R + 1) bound of Deshpande et al., fill in the details of the proof of a key lemma, and prove two bounds for real-valued utility functions: k(ln R_1 + 1) and (ln R_E + 1). Here R_1 equals the maximum ratio between the largest increase in utility attainable from a single item, and the smallest non-zero increase attainable from that same item (in the same state). The quantity R_E equals the maximum ratio between the largest expected increase in utility from a single item, and the smallest non-zero expected increase in utility from that same item. Our bounds apply only to the stochastic setting with independent states.
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Denmark > Central Jutland > Aarhus (0.04)