AITopics | Bhattacharjee, Robi

Collaborating Authors

Bhattacharjee, Robi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to safely discard features based on aggregate SHAP values

Bhattacharjee, Robi, Frohnapfel, Karolin, von Luxburg, Ulrike

arXiv.org Machine LearningMar-29-2025

SHAP is one of the most popular local feature-attribution methods. Given a function f and an input x, it quantifies each feature's contribution to f(x). Recently, SHAP has been increasingly used for global insights: practitioners average the absolute SHAP values over many data points to compute global feature importance scores, which are then used to discard unimportant features. In this work, we investigate the soundness of this practice by asking whether small aggregate SHAP values necessarily imply that the corresponding feature does not affect the function. Unfortunately, the answer is no: even if the i-th SHAP value is 0 on the entire data support, there exist functions that clearly depend on Feature i. The issue is that computing SHAP values involves evaluating f on points outside of the data support, where f can be strategically designed to mask its dependence on Feature i. To address this, we propose to aggregate SHAP values over the extended support, which is the product of the marginals of the underlying distribution. With this modification, we show that a small aggregate SHAP value implies that we can safely discard the corresponding feature. We then extend our results to KernelSHAP, the most popular method to approximate SHAP values in practice. We show that if KernelSHAP is computed over the extended distribution, a small aggregate value justifies feature removal. This result holds independently of whether KernelSHAP accurately approximates true SHAP values, making it one of the first theoretical results to characterize the KernelSHAP algorithm itself. Our findings have both theoretical and practical implications. We introduce the Shapley Lie algebra, which offers algebraic insights that may enable a deeper investigation of SHAP and we show that randomly permuting each column of the data matrix enables safely discarding features based on aggregate SHAP and KernelSHAP values.

artificial intelligence, machine learning, shap value, (17 more...)

arXiv.org Machine Learning

2503.23111

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Water & Waste Management > Water Management > Water Supplies & Services (0.46)

Technology:

Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Auditing Local Explanations is Hard

Bhattacharjee, Robi, von Luxburg, Ulrike

arXiv.org Artificial IntelligenceJul-18-2024

In sensitive contexts, providers of machine learning algorithms are increasingly required to give explanations for their algorithms' decisions. However, explanation receivers might not trust the provider, who potentially could output misleading or manipulated explanations. In this work, we investigate an auditing framework in which a third-party auditor or a collective of users attempts to sanity-check explanations: they can query model decisions and the corresponding local explanations, pool all the information received, and then check for basic consistency properties. We prove upper and lower bounds on the amount of queries that are needed for an auditor to succeed within this framework. Our results show that successful auditing requires a potentially exorbitant number of queries -- particularly in high dimensional cases. Our analysis also reveals that a key property is the ``locality'' of the provided explanations -- a quantity that so far has not been paid much attention to in the explainability literature. Looking forward, our results suggest that for complex high-dimensional settings, merely providing a pointwise prediction and explanation could be insufficient, as there is no way for the users to verify that the provided explanations are not completely made-up.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.13281

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift

Bhattacharjee, Robi, Rittler, Nick, Chaudhuri, Kamalika

arXiv.org Artificial IntelligenceMay-29-2024

Many machine learning models appear to deploy effortlessly under distribution shift, and perform well on a target distribution that is considerably different from the training distribution. Yet, learning theory of distribution shift bounds performance on the target distribution as a function of the discrepancy between the source and target, rarely guaranteeing high target accuracy. Motivated by this gap, this work takes a closer look at the theory of distribution shift for a classifier from a source to a target distribution. Instead of relying on the discrepancy, we adopt an Invariant-Risk-Minimization (IRM)-like assumption connecting the distributions, and characterize conditions under which data from a source distribution is sufficient for accurate classification of the target. When these conditions are not met, we show when only unlabeled data from the target is sufficient, and when labeled target data is needed. In all cases, we provide rigorous theoretical guarantees in the large sample regime.

artificial intelligence, feature map, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.19156

Country:

North America > United States > California (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Structure from Voltage

Bhattacharjee, Robi, Cloninger, Alex, Freund, Yoav, Oslandsbotn, Andreas

arXiv.org Artificial IntelligenceJun-28-2023

Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2203.00063

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Effective resistance in metric spaces

Bhattacharjee, Robi, Cloninger, Alexander, Freund, Yoav, Oslandsbotn, Andreas

arXiv.org Artificial IntelligenceJun-27-2023

Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigenvectors of the graph Laplacian. One attractive application of ER is to point clouds, i.e. graphs whose vertices correspond to IID samples from a distribution over a metric space. Unfortunately, it was shown that the ER between any two points converges to a trivial quantity that holds no information about the graph's structure as the size of the sample increases to infinity. In this study, we show that this trivial solution can be circumvented by considering a region-based ER between pairs of small regions rather than pairs of points and by scaling the edge weights appropriately with respect to the underlying density in each region. By keeping the regions fixed, we show analytically that the region-based ER converges to a non-trivial limit as the number of points increases to infinity. Namely the ER on a metric space. We support our theoretical findings with numerical experiments.

artificial intelligence, effective resistance, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.15649

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.81)

Add feedback

Data-Copying in Generative Models: A Formal Framework

Bhattacharjee, Robi, Dasgupta, Sanjoy, Chaudhuri, Kamalika

arXiv.org Artificial IntelligenceMar-1-2023

There has been some recent interest in detecting and addressing memorization of training data by deep neural networks. A formal framework for memorization in generative models, called "data-copying," was proposed by Meehan et. al. (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. Motivated by this and the theory of non-parametric methods, we provide an alternative definition of data-copying that applies more locally. We provide a method to detect data-copying, and provably show that it works with high probability when enough data is available. We also provide lower bounds that characterize the sample requirement for reliable detection.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.13181

Country:

Europe (0.67)
North America > United States > California (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Robust Empirical Risk Minimization with Tolerance

Bhattacharjee, Robi, Hopkins, Max, Kumar, Akash, Yu, Hantao, Chaudhuri, Kamalika

arXiv.org Artificial IntelligenceFeb-4-2023

Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) $\textit{empirical risk minimization}$ (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called $\textit{tolerant}$ robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for $\gamma$-tolerant robust learning VC classes over $\mathbb{R}^d$, and requires only $\tilde{O}\left( \frac{VC(H)d\log \frac{D}{\gamma\delta}}{\epsilon^2}\right)$ samples for robustness regions of (maximum) diameter $D$.

artificial intelligence, machine learning, robustness region, (15 more...)

arXiv.org Artificial Intelligence

2210.00635

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Consistent Non-Parametric Methods for Maximizing Robustness

Bhattacharjee, Robi, Chaudhuri, Kamalika

arXiv.org Artificial IntelligenceJan-18-2023

Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius $r$ that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data, and smaller in others. In this paper, we address this limitation by proposing a new limit classifier, called the neighborhood optimal classifier, that extends the Bayes optimal classifier outside its support by using the label of the closest in-support point. We then argue that this classifier maximizes the size of its robustness regions subject to the constraint of having accuracy equal to the Bayes optimal. We then present sufficient conditions under which general non-parametric methods that can be represented as weight functions converge towards this limit, and show that both nearest neighbors and kernel classifiers satisfy them under certain conditions.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2102.09086

Country:

Europe (1.00)
North America > Canada (0.93)
North America > United States > California (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

No-substitution k-means Clustering with Adversarial Order

Bhattacharjee, Robi, Moshkovitz, Michal

arXiv.org Artificial IntelligenceJan-18-2023

We investigate $k$-means clustering in the online no-substitution setting when the input arrives in \emph{arbitrary} order. In this setting, points arrive one after another, and the algorithm is required to instantly decide whether to take the current point as a center before observing the next point. Decisions are irrevocable. The goal is to minimize both the number of centers and the $k$-means cost. Previous works in this setting assume that the input's order is random, or that the input's aspect ratio is bounded. It is known that if the order is arbitrary and there is no assumption on the input, then any algorithm must take all points as centers. Moreover, assuming a bounded aspect ratio is too restrictive -- it does not include natural input generated from mixture models. We introduce a new complexity measure that quantifies the difficulty of clustering a dataset arriving in arbitrary order. We design a new random algorithm and prove that if applied on data with complexity $d$, the algorithm takes $O(d\log(n) k\log(k))$ centers and is an $O(k^3)$-approximation. We also prove that if the data is sampled from a ``natural" distribution, such as a mixture of $k$ Gaussians, then the new complexity measure is equal to $O(k^2\log(n))$. This implies that for data generated from those distributions, our new algorithm takes only $\text{poly}(k\log(n))$ centers and is a $\text{poly}(k)$-approximation. In terms of negative results, we prove that the number of centers needed to achieve an $\alpha$-approximation is at least $\Omega\left(\frac{d}{k\log(n\alpha)}\right)$.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2012.14512

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)

Add feedback

Online $k$-means Clustering on Arbitrary Data Streams

Bhattacharjee, Robi, Imola, Jacob, Moshkovitz, Michal, Dasgupta, Sanjoy

arXiv.org Artificial IntelligenceJul-31-2022

We consider online $k$-means clustering where each new point is assigned to the nearest cluster center, after which the algorithm may update its centers. The loss incurred is the sum of squared distances from new points to their assigned cluster centers. The goal over a data stream $X$ is to achieve loss that is a constant factor of $L(X, OPT_k)$, the best possible loss using $k$ fixed points in hindsight. We propose a data parameter, $\Lambda(X)$, such that for any algorithm maintaining $O(k\text{poly}(\log n))$ centers at time $n$, there exists a data stream $X$ for which a loss of $\Omega(\Lambda(X))$ is inevitable. We then give a randomized algorithm that achieves clustering loss $O(\Lambda(X) + L(X, OPT_k))$. Our algorithm uses $O(k\text{poly}(\log n))$ memory and maintains $O(k\text{poly}(\log n))$ cluster centers. Our algorithm also enjoys a running time of $O(k\text{poly}(\log n))$ and is the first algorithm to achieve polynomial space and time complexity in this setting. It also is the first to have provable guarantees without making any assumptions on the input data.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2102.09101

Country: North America > United States (0.67)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback