AITopics

2410.06615

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-8-2024

Benign Overfitting for Regression with Trained Two-Layer ReLU Networks

Park, Junhyung, Bloebaum, Patrick, Kasiviswanathan, Shiva Prasad

We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow. Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded. We operate in the neural tangent kernel regime, and our generalization result is developed via a decomposition of the excess risk into estimation and approximation errors, viewing gradient flow as an implicit regularizer. This decomposition in the context of neural networks is a novel perspective of gradient descent, and helps us avoid uniform convergence traps. In this work, we also establish that under the same setting, the trained network overfits to the data. Together, these results, establishes the first result on benign overfitting for finite-width ReLU networks for arbitrary regression functions.

artificial intelligence, gradient flow, machine learning, (16 more...)

2410.06191

Country:

North America > United States (0.27)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

arXiv.org Machine LearningDec-3-2023

Debiasing Conditional Stochastic Optimization

He, Lie, Kasiviswanathan, Shiva Prasad

In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.

artificial intelligence, machine learning, null null 2 2, (12 more...)

2304.10613

Country:

Europe (0.92)
North America > United States > New York (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningOct-24-2023

Differentially Private Conditional Independence Testing

Kalemaj, Iden, Kasiviswanathan, Shiva Prasad, Ramdas, Aaditya

Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Cand\`es et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.

artificial intelligence, machine learning, sensitivity, (16 more...)

2306.06721

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceJul-19-2023

Sequential Kernelized Independence Testing

Podkopaev, Aleksandr, Blöbaum, Patrick, Kasiviswanathan, Shiva Prasad, Ramdas, Aaditya

Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data.

artificial intelligence, machine learning, skit, (16 more...)

2212.07383

Country:

Europe (0.67)
Africa > South Africa (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Artificial IntelligenceJun-6-2023

Interventional and Counterfactual Inference with Diffusion Models

Chao, Patrick, Blöbaum, Patrick, Kasiviswanathan, Shiva Prasad

We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach.

artificial intelligence, machine learning, query, (18 more...)

2302.0086

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

arXiv.org Artificial IntelligenceJan-30-2023

Thompson Sampling with Diffusion Generative Prior

Hsieh, Yu-Guan, Kasiviswanathan, Shiva Prasad, Kveton, Branislav, Blöbaum, Patrick

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach.

data mining, diffusion model, machine learning, (19 more...)

2301.05182

Country: Europe (0.45)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

arXiv.org Machine LearningJun-8-2022

Uplifting Bandits

Hsieh, Yu-Guan, Kasiviswanathan, Shiva Prasad, Kveton, Branislav

We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure.

artificial intelligence, uplifting bandit

2206.04091

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

arXiv.org Artificial IntelligenceJun-6-2021

Collaborative Causal Discovery with Atomic Interventions

Addanki, Raghavendra, Kasiviswanathan, Shiva Prasad

We introduce a new Collaborative Causal Discovery problem, through which we model a common scenario in which we have multiple independent entities each with their own causal graph, and the goal is to simultaneously learn all these causal graphs. We study this problem without the causal sufficiency assumption, using Maximal Ancestral Graphs (MAG) to model the causal graphs, and assuming that we have the ability to actively perform independent single vertex (or atomic) interventions on the entities. If the $M$ underlying (unknown) causal graphs of the entities satisfy a natural notion of clustering, we give algorithms that leverage this property and recovers all the causal graphs using roughly logarithmic in $M$ number of atomic interventions per entity. These are significantly fewer than $n$ atomic interventions per entity required to learn each causal graph separately, where $n$ is the number of observable nodes in the causal graph. We complement our results with a lower bound and discuss various extensions of our collaborative setting.

artificial intelligence, health & medicine, intervention, (18 more...)

2106.03028

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJul-12-2020

Efficient Intervention Design for Causal Discovery with Latents

Addanki, Raghavendra, Kasiviswanathan, Shiva Prasad, McGregor, Andrew, Musco, Cameron

We consider recovering a causal graph in presence of latent variables, where we seek to minimize the cost of interventions used in the recovery process. We consider two intervention cost models: (1) a linear cost model where the cost of an intervention on a subset of variables has a linear form, and (2) an identity cost model where the cost of an intervention is the same, regardless of what variables it is on, i.e., the goal is just to minimize the number of interventions. Under the linear cost model, we give an algorithm to identify the ancestral relations of the underlying causal graph, achieving within a $2$-factor of the optimal intervention cost. This approximation factor can be improved to $1+\epsilon$ for any $\epsilon > 0$ under some mild restrictions. Under the identity cost model, we bound the number of interventions needed to recover the entire causal graph, including the latent variables, using a parameterization of the causal graph through a special type of colliders. In particular, we introduce the notion of $p$-colliders, that are colliders between pair of nodes arising from a specific type of conditioning in the causal graph, and provide an upper bound on the number of interventions as a function of the maximum number of $p$-colliders between any two nodes in the causal graph.

artificial intelligence, intervention, machine learning, (18 more...)

2005.11736

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)