Goto

Collaborating Authors

 lučić


Applying explainable AI algorithms to healthcare

AIHub

Saliency map explanation for an ECG exam that is predicted to be low-quality. Red highlights the part of the image most important to the model's prediction, while purple indicates the least important area. Ana Lucic, a PhD candidate at the Information Retrieval Lab (IRLab) of the Informatics Institute of UvA, has developed a framework for explaining predictions of machine learning models that could improve heart examinations for underserved communities. The work of Lucic is part of the subfield of AI, called explainable artificial intelligence (XAI). "We need explainable AI", says Lucic, "because machine learning models are often difficult to interpret. They have complex architectures and large numbers of parameters, so it's not clear how the input contributes to the output."


Training Gaussian Mixture Models at Scale via Coresets

Lucic, Mario, Faulkner, Matthew, Krause, Andreas, Feldman, Dan

arXiv.org Machine Learning

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error.