Goto

Collaborating Authors

Interpretable multiclass classification by MDL-based rule lists

arXiv.org Artificial Intelligence

Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion.



A tutorial on MDL hypothesis testing for graph analysis

arXiv.org Machine Learning

When analysing graph structure, it can be difficult to determine whether patterns found are due to chance, or due to structural aspects of the process that generated the data. Hypothesis tests are often used to support such analyses. These allow us to make statistical inferences about which null models are responsible for the data, and they can be used as a heuristic in searching for meaningful patterns. The minimum description length (MDL) principle [6, 4] allows us to build such hypothesis tests, based on efficient descriptions of the data. Broadly: we translate the regularity we are interested in into a code for the data, and if this code describes the data more efficiently than a code corresponding to the null model, by a sufficient margin, we may reject the null model. This is a frequentist approach to MDL, based on hypothesis testing. Bayesian approaches to MDL for model selection rather than model rejection are more common, but for the purposes of pattern analysis, a hypothesis testing approach provides a more natural fit with existing literature. 1 We provide a brief illustration of this principle based on the running example of analysing the size of the largest clique in a graph. We illustrate how a code can be constructed to efficiently represent graphs with large cliques, and how the description length of the data under this code can be compared to the description length under a code corresponding to a null model to show that the null model is highly unlikely to have generated the data.


A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning

arXiv.org Artificial Intelligence

We extend the Chow-Liu algorithm for general random variables while the previous versions only considered finite cases. In particular, this paper applies the generalization to Suzuki's learning algorithm that generates from data forests rather than trees based on the minimum description length by balancing the fitness of the data to the forest and the simplicity of the forest. As a result, we successfully obtain an algorithm when both of the Gaussian and finite random variables are present.


Inferring Latent dimension of Linear Dynamical System with Minimum Description Length

arXiv.org Machine Learning

Time-invariant linear dynamical system arises in many real-world applications,and its usefulness is widely acknowledged. A practical limitation with this model is that its latent dimension that has a large impact on the model capability needs to be manually specified. It can be demonstrated that a lower-order model class could be totally nested into a higher-order class, and the corresponding likelihood is nondecreasing. Hence, criterion built on the likelihood is not appropriate for model selection. This paper addresses the issue and proposes a criterion for linear dynamical system based on the principle of minimum description length. The latent structure, which is omitted in previous work, is explicitly considered in this newly proposed criterion. Our work extends the principle of minimum description length and demonstrates its effectiveness in the tasks of model training. The experiments on both univariate and multivariate sequences confirm the good performance of our newly proposed method.