formulation and approximation
The Cluster Description Problem - Complexity Results, Formulations and Approximations
Consider the situation where you are given an existing $k$-way clustering $\pi$. A challenge for explainable AI is to find a compact and distinct explanations of each cluster which in this paper is using instance-level descriptors/tags from a common dictionary. Since the descriptors/tags were not given to the clustering method, this is not a semi-supervised learning situation. We show that the \emph{feasibility} problem of just testing whether any distinct description (not the most compact) exists is generally intractable for just two clusters. This means that unless \textbf{P} = \cnp, there cannot exist an efficient algorithm for the cluster description problem. Hence, we explore ILP formulations for smaller problems and a relaxed but restricted setting that leads to a polynomial time algorithm for larger problems. We explore several extension to the basic setting such as the ability to ignore some instances and composition constraints on the descriptions of the clusters. We show our formulation's usefulness on Twitter data where the communities were found using social connectivity (i.e.
Reviews: The Cluster Description Problem - Complexity Results, Formulations and Approximations
The main contributions of the paper are: - Formulating a cluster description problem as a combinatorial optimisation problem to find descriptions for clusters that are descriptive and non-overlapping - Showing NP-completeness of the problem, even if there are only two clusters - Providing ILP encodings modelling (also extended) variants of the problem - Providing a tractable subclass of the problem, and together with a polynomial time algorithm for this subclass Pros: - A novel problem that is relevant by providing explanations for previously computed clusterings - Well-written paper with a clear focus and clear contributions Cons: - In lines 102–103, it is stated that some version of DTDF can be reduced to SAT. Should not this be 2-SAT? Since DTDF is shown to be NP-complete, all versions of DTDF can be reduced to SAT, right? The NP-completeness should hold in general (if not, it should not be possible to devise and ILP encoding for the problem). While I did not check all the details in the supplementary, the constructions for reductions seem sound.
The Cluster Description Problem - Complexity Results, Formulations and Approximations
Davidson, Ian, Gourru, Antoine, Ravi, S
Consider the situation where you are given an existing $k$-way clustering $\pi$. A challenge for explainable AI is to find a compact and distinct explanations of each cluster which in this paper is using instance-level descriptors/tags from a common dictionary. Since the descriptors/tags were not given to the clustering method, this is not a semi-supervised learning situation. We show that the \emph{feasibility} problem of just testing whether any distinct description (not the most compact) exists is generally intractable for just two clusters. This means that unless \textbf{P} \cnp, there cannot exist an efficient algorithm for the cluster description problem.