pure cluster
Reviews: Foundations of Comparison-Based Hierarchical Clustering
In this work the authors study hierarchical clustering under quadruplet comparison framework. The authors show that single and complete linkages are inherently comparison based and propose two variants of average linkage clustering exploiting quadruplet comparison. Exact hierarchy recovery guarantee is provided under planted hierarchical partition model and empirical evaluation is provided. The meaning of the variables \mu, \delta etc are hard to interpret from the description. They have been nicely summarized (and explained) in the appendix A.1.
Interactive Log Parsing via Light-weight User Feedback
Wang, Liming, Xie, Hong, Li, Ye, Tan, Jian, Lui, John C. S.
Template mining is one of the foundational tasks to support log analysis, which supports the diagnosis and troubleshooting of large scale Web applications. This paper develops a human-in-the-loop template mining framework to support interactive log analysis, which is highly desirable in real-world diagnosis or troubleshooting of Web applications but yet previous template mining algorithms fails to support it. We formulate three types of light-weight user feedbacks and based on them we design three atomic human-in-the-loop template mining algorithms. We derive mild conditions under which the outputs of our proposed algorithms are provably correct. We also derive upper bounds on the computational complexity and query complexity of each algorithm. We demonstrate the versatility of our proposed algorithms by combining them to improve the template mining accuracy of five representative algorithms over sixteen widely used benchmark datasets.
- North America > United States > Texas > Travis County > Austin (0.05)
- Asia > China > Chongqing Province > Chongqing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Foundations of Comparison-Based Hierarchical Clustering
Ghoshdastidar, Debarghya, Perrot, Michaël, von Luxburg, Ulrike
We address the classical problem of hierarchical clustering, but in a framework where one does not have access to a representation of the objects or their pairwise similarities. Instead we assume that only a set of comparisons between objects are available in terms of statements of the form "objects $i$ and $j$ are more similar than objects $k$ and $l$". Such a scenario is commonly encountered in crowdsourcing applications. The focus of this work is to develop comparison-based hierarchical clustering algorithms that do not rely on the principles of ordinal embedding. We propose comparison-based variants of average linkage clustering. We provide statistical guarantees for the proposed methods under a planted partition model for hierarchical clustering. We also empirically demonstrate the performance of the proposed methods on several datasets.
Predicting litigation likelihood and time to litigation for patents
Wongchaisuwat, Papis, Klabjan, Diego, McGinnis, John O.
Patent lawsuits are costly and time-consuming. An ability to forecast a patent litigation and time to litigation allows companies to better allocate budget and time in managing their patent portfolios. We develop predictive models for estimating the likelihood of litigation for patents and the expected time to litigation based on both textual and non-textual features. Our work focuses on improving the state-of-the-art by relying on a different set of features and employing more sophisticated algorithms with more realistic data. The rate of patent litigations is very low, which consequently makes the problem difficult. The initial model for predicting the likelihood is further modified to capture a time-to-litigation perspective.
- Europe > Germany (0.14)
- North America > United States > Texas (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- (2 more...)
- Law > Litigation (1.00)
- Government > Regional Government > North America Government > United States Government (0.96)