AITopics | Dwivedi, Raaz

Collaborating Authors

Dwivedi, Raaz

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Low-Rank Thinning

Carrell, Annabelle Michael, Gong, Albert, Shetty, Abhishek, Dwivedi, Raaz, Mackey, Lester

arXiv.org Machine LearningFeb-17-2025

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2502.12063

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.27)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Cheap Permutation Testing

Domingo-Enrich, Carles, Dwivedi, Raaz, Mackey, Lester

arXiv.org Machine LearningFeb-11-2025

Permutation tests are a popular choice for distinguishing distributions and testing independence, due to their exact, finite-sample control of false positives and their minimax optimality when paired with U-statistics. However, standard permutation tests are also expensive, requiring a test statistic to be computed hundreds or thousands of times to detect a separation between distributions. In this work, we offer a simple approach to accelerate testing: group your datapoints into bins and permute only those bins. For U and V-statistics, we prove that these cheap permutation tests have two remarkable properties. First, by storing appropriate sufficient statistics, a cheap test can be run in time comparable to evaluating a single test statistic. Second, cheap permutation power closely approximates standard permutation power. As a result, cheap tests inherit the exact false positive control and minimax optimality of standard permutation tests while running in a fraction of the time. We complement these findings with improved power guarantees for standard permutation testing and experiments demonstrating the benefits of cheap permutations over standard maximum mean discrepancy (MMD), Hilbert-Schmidt independence criterion (HSIC), random Fourier feature, Wilcoxon-Mann-Whitney, cross-MMD, and cross-HSIC tests.

data mining, machine learning, permutation test, (17 more...)

arXiv.org Machine Learning

2502.07672

Country: Asia > China (0.13)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Learning Counterfactual Distributions via Kernel Nearest Neighbors

Choi, Kyuseong, Feitelberg, Jacob, Chin, Caleb, Agarwal, Anish, Dwivedi, Raaz

arXiv.org Machine LearningDec-2-2024

Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the distribution of a user's weekly spend and engagement under a specific mobile app version. A common challenge is the prevalence of missing not at random data, where observations are available only for certain unit-outcome combinations and the observation availability can be correlated with the properties of distributions themselves, i.e., there is unobserved confounding. An additional challenge is that for any observed unit-outcome entry, we only have a finite number of samples from the underlying distribution. We tackle these two challenges by casting the problem into a novel distributional matrix completion framework and introduce a kernel based distributional generalization of nearest neighbors to estimate the underlying distributions. By leveraging maximum mean discrepancies and a suitable factor model on the kernel mean embeddings of the underlying distributions, we establish consistent recovery of the underlying distributions even when data is missing not at random and positivity constraints are violated. Furthermore, we demonstrate that our nearest neighbors approach is robust to heteroscedastic noise, provided we have access to two or more measurements for the observed unit-outcome entries, a robustness not present in prior works on nearest neighbors with single measurements.

artificial intelligence, kernel -nn, missingness, (17 more...)

arXiv.org Machine Learning

2410.13381

Country: North America > United States (0.27)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)

Add feedback

On adaptivity and minimax optimality of two-sided nearest neighbors

Sadhukhan, Tathagata, Paul, Manit, Dwivedi, Raaz

arXiv.org Machine LearningNov-19-2024

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying data is sufficiently smooth and the missingness probabilities are lower bounded. Here we analyze NN with non-smooth non-linear functions with vast amounts of missingness. In particular, we consider matrix completion settings where the entries of the underlying matrix follow a latent non-linear factor model, with the non-linearity belonging to a \Holder function class that is less smooth than Lipschitz. Our results establish following favorable properties for a suitable two-sided NN: (1) The mean squared error (MSE) of NN adapts to the smoothness of the non-linearity, (2) under certain regularity conditions, the NN error rate matches the rate obtained by an oracle equipped with the knowledge of both the row and column latent factors, and finally (3) NN's MSE is non-trivial for a wide range of settings even when several matrix entries might be missing deterministically. We support our theoretical findings via extensive numerical simulations and a case study with data from a mobile health study, HeartSteps.

artificial intelligence, col, machine learning, (14 more...)

arXiv.org Machine Learning

2411.12965

Country: North America > United States (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)

Add feedback

Supervised Kernel Thinning

Gong, Albert, Choi, Kyuseong, Dwivedi, Raaz

arXiv.org Machine LearningOct-17-2024

The kernel thinning algorithm of Dwivedi & Mackey (2024) provides a better-than-i.i.d. compression of a generic set of points. By generating high-fidelity coresets of size significantly smaller than the input points, KT is known to speed up unsupervised tasks like Monte Carlo integration, uncertainty quantification, and non-parametric hypothesis testing, with minimal loss in statistical accuracy. In this work, we generalize the KT algorithm to speed up supervised learning problems involving kernel methods. Specifically, we combine two classical algorithms--Nadaraya-Watson (NW) regression or kernel smoothing, and kernel ridge regression (KRR)--with KT to provide a quadratic speed-up in both training and inference times. We show how distribution compression with KT in each setting reduces to constructing an appropriate kernel, and introduce the Kernel-Thinned NW and Kernel-Thinned KRR estimators. We prove that KT-based regression estimators enjoy significantly superior computational efficiency over the full-data estimators and improved statistical efficiency over i.i.d. subsampling of the training data. En route, we also provide a novel multiplicative error guarantee for compressing with KT. We validate our design choices with both simulations and real data experiments.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

2410.13749

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

Feitelberg, Jacob, Choi, Kyuseong, Agarwal, Anish, Dwivedi, Raaz

arXiv.org Machine LearningOct-16-2024

We introduce the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a generalization of traditional matrix completion where the observations per matrix entry are scalar valued. To do so, we utilize tools from optimal transport to generalize the nearest neighbors method to the distributional setting. Under a suitable latent factor model on probability distributions, we establish that our method recovers the distributions in the Wasserstein norm. We demonstrate through simulations that our method is able to (i) provide better distributional estimates for an entry compared to using observed samples for that entry alone, (ii) yield accurate estimates of distributional quantities such as standard deviation and value-at-risk, and (iii) inherently support heteroscedastic noise. We also prove novel asymptotic results for Wasserstein barycenters over one-dimensional distributions.

artificial intelligence, machine learning, neighbor, (18 more...)

arXiv.org Machine Learning

2410.13112

Country:

North America > Canada > Quebec (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.63)

Add feedback

Debiased Distribution Compression

Li, Lingxiao, Dwivedi, Raaz, Mackey, Lester

arXiv.org Machine LearningApr-18-2024

Modern compression methods can summarize a target distribution $\mathbb{P}$ more succinctly than i.i.d. sampling but require access to a low-bias input sequence like a Markov chain converging quickly to $\mathbb{P}$. We introduce a new suite of compression methods suitable for compression with biased input sequences. Given $n$ points targeting the wrong distribution and quadratic time, Stein Kernel Thinning (SKT) returns $\sqrt{n}$ equal-weighted points with $\widetilde{O}(n^{-1/2})$ maximum mean discrepancy (MMD) to $\mathbb {P}$. For larger-scale compression tasks, Low-rank SKT achieves the same feat in sub-quadratic time using an adaptive low-rank debiasing procedure that may be of independent interest. For downstream tasks that support simplex or constant-preserving weights, Stein Recombination and Stein Cholesky achieve even greater parsimony, matching the guarantees of SKT with as few as $\operatorname{poly-log}(n)$ weighted points. Underlying these advances are new guarantees for the quality of simplex-weighted coresets, the spectral decay of kernel matrices, and the covering numbers of Stein kernel Hilbert spaces. In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

2404.1229

Country:

North America > United States > New York (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations

Dwivedi-Yu, Jane, Dwivedi, Raaz, Schick, Timo

arXiv.org Artificial IntelligenceApr-9-2024

The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups or attributes, and being able to capture biases that appear in typical usage (rather than just extreme, rare cases). Relatedly, bias evaluation should surface not only egregious biases but also ones that are subtle and commonplace, such as a likelihood for talking about appearances with regard to women. We present FairPair, an evaluation framework for assessing differential treatment that occurs during ordinary usage. FairPair operates through counterfactual pairs, but crucially, the paired continuations are grounded in the same demographic group, which ensures equivalent comparison. Additionally, unlike prior work, our method factors in the inherent variability that comes from the generation process itself by measuring the sampling variability. We present an evaluation of several commonly used generative models and a qualitative analysis that indicates a preference for discussing family and hobbies with regard to women.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.06619

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.49)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

Doubly Robust Inference in Causal Latent Factor Models

Abadie, Alberto, Agarwal, Anish, Dwivedi, Raaz, Shah, Abhin

arXiv.org Machine LearningFeb-18-2024

This article presents a novel framework for the estimation of average treatment effects in modern data-rich environments in the presence of unobserved confounding. Modern data-rich environments are characterized by repeated measurements of outcomes, such as clinical metrics or purchase history, across a substantial number of units--be it patients in medical contexts or customers in online retail. As an example, consider an internet-retail platform where customers interact with various product categories. For each consumer-category pair, the platform makes decisions to either offer a discount or not, and records whether the consumer purchased a product in the category. Given an observational dataset capturing such interactions, our objective is to infer the causal effect of offering the discount on consumer purchase behavior. More specifically, we aim to infer two kinds of treatment effects: (a) tailored to product categories, the average impact of the discount on a product across consumers, and (b) tailored to consumers, the average impact of the discount on a consumer across product categories. This task is challenging due to unobserved confounding that may cause spurious associations between discount allocation and product purchase.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Machine Learning

2402.11652

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Overview (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On counterfactual inference with unobserved confounding

Shah, Abhin, Dwivedi, Raaz, Shah, Devavrat, Wornell, Gregory W.

arXiv.org Artificial IntelligenceSep-14-2023

Given an observational study with $n$ independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one $p$-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the conditional distribution of the outcomes as an exponential family, we reduce learning the unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.08209

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback