AITopics | Gan, Luqin

Collaborating Authors

Gan, Luqin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretable Machine Learning for Discovery: Statistical Challenges \& Opportunities

Allen, Genevera I., Gan, Luqin, Zheng, Lili

arXiv.org Artificial IntelligenceAug-2-2023

Machine learning systems have gained widespread use in science, technology, and society. Given the increasing number of high-stakes machine learning applications and the growing complexity of machine learning models, many have advocated for interpretability and explainability to promote understanding and trust in machine learning results (Rasheed et al., 2022, Toreini et al., 2020, Broderick et al., 2023). In response, there has been a recent explosion of research on Interpretable Machine Learning (IML), mostly focusing on new techniques to interpret black-box systems; see Molnar (2022), Lipton (2018), Guidotti et al. (2018), Doshi-Velez & Kim (2017), Du et al. (2019), Murdoch et al. (2019), Carvalho et al. (2019) for recent reviews of the IML and explainable artificial intelligence literature. While most of these interpretability techniques were not necessarily designed for this purpose, they are increasingly being used to mine large and complex data sets to generate new insights (Roscher et al., 2020). These so-called data-driven discoveries are especially important to advance data-rich fields in science, technology, and medicine. While prior reviews focus mainly on IML techniques, we primarily review how IML methods promote data-driven discoveries, challenges associated with this task, and related new research opportunities at the intersection of machine learning and statistics. In the sciences and beyond, IML techniques are routinely employed to make new discoveries from large and complex data sets; to motivate our review on this topic, we highlight several examples. First, feature importance and feature selection in supervised learning are popular forms of interpretation that have led to major discoveries like discovering new genomic biomarkers of diseases (Guyon et al., 2002), discovering physical laws governing dynamical systems (Brunton et al., 2016), and discovering lesions and other abnormalities in radiology (Borjali et al., 2020, Reyes et al., 2020). While most of the IML literature focuses on supervised learning (Molnar, 2022, Lipton, 2018, Guidotti et al., 2018, Doshi-Velez & Kim, 2017), there have been many major scientific discoveries made via unsupervised techniques and we argue that these approaches

artificial intelligence, discovery, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.01475

Country:

North America > United States (0.94)
North America > Canada > Alberta > Census Division No. 8 > Red Deer County (0.24)
North America > Canada > Alberta > Census Division No. 7 > Stettler County No. 6 (0.24)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.66)

Add feedback

Model-Agnostic Confidence Intervals for Feature Importance: A Fast and Powerful Approach Using Minipatch Ensembles

Gan, Luqin, Zheng, Lili, Allen, Genevera I.

arXiv.org Artificial IntelligenceJan-24-2023

To promote new scientific discoveries from complex data sets, feature importance inference has been a long-standing statistical problem. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing approaches often make distributional assumptions, which can be difficult to verify in practice, or require model refitting and data splitting, which are computationally intensive and lead to losses in power. In this work, we develop a novel, mostly model-agnostic and distribution-free inference framework for feature importance that is computationally efficient and statistically powerful. Our approach is fast as we avoid model refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting. Our framework can be applied on tabular data and with any machine learning algorithm, together with minipatch ensembles, for regression and classification tasks. Despite the dependencies induced by using minipatch ensembles, we show that our approach provides asymptotic coverage for the feature importance score of any model under mild assumptions. Finally, our same procedure can also be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both predictions and feature importance. We validate our intervals on a series of synthetic and real data examples, including non-linear settings, showing that our approach detects the correct important features and exhibits many computational and statistical advantages over existing methods.

artificial intelligence, machine learning, minipatch, (18 more...)

arXiv.org Artificial Intelligence

2206.02088

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Fast and Interpretable Consensus Clustering via Minipatch Learning

Gan, Luqin, Allen, Genevera I.

arXiv.org Machine LearningOct-18-2021

Consensus clustering has been widely used in bioinformatics and other applications to improve the accuracy, stability and reliability of clustering results. This approach ensembles cluster co-occurrences from multiple clustering runs on subsampled observations. For application to large-scale bioinformatics data, such as to discover cell types from single-cell sequencing data, for example, consensus clustering has two significant drawbacks: (i) computational inefficiency due to repeatedly applying clustering algorithms, and (ii) lack of interpretability into the important features for differentiating clusters. In this paper, we address these two challenges by developing IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering. Our approach adopts three major innovations. We ensemble cluster co-occurrences from tiny subsets of both observations and features, termed minipatches, thus dramatically reducing computation time. Additionally, we develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings, as well as adaptive sampling schemes of features, which leads to interpretable solutions by quickly learning the most relevant features that differentiate clusters. We study our approach on synthetic data and a variety of real large-scale bioinformatics data sets; results show that our approach not only yields more accurate and interpretable cluster solutions, but it also substantially improves computational efficiency compared to standard consensus clustering approaches.

bioinformatics, data mining, machine learning, (23 more...)

arXiv.org Machine Learning

2110.02388

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Therapeutic Area > Endocrinology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback