AITopics

2504.12594

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

arXiv.org Artificial IntelligenceMay-29-2024

Synthetic Potential Outcomes for Mixtures of Treatment Effects

Mazaheri, Bijan, Squires, Chandler, Uhler, Caroline

Modern data analysis frequently relies on the use of large datasets, often constructed as amalgamations of diverse populations or data-sources. Heterogeneity across these smaller datasets constitutes two major challenges for causal inference: (1) the source of each sample can introduce latent confounding between treatment and effect, and (2) diverse populations may respond differently to the same treatment, giving rise to heterogeneous treatment effects (HTEs). The issues of latent confounding and HTEs have been studied separately but not in conjunction. In particular, previous works only report the conditional average treatment effect (CATE) among similar individuals (with respect to the measured covariates). CATEs cannot resolve mixtures of potential treatment effects driven by latent heterogeneity, which we call mixtures of treatment effects (MTEs). Inspired by method of moment approaches to mixture models, we propose "synthetic potential outcomes" (SPOs). Our new approach deconfounds heterogeneity while also guaranteeing the identifiability of MTEs. This technique bypasses full recovery of a mixture, which significantly simplifies its requirements for identifiability. We demonstrate the efficacy of SPOs on synthetic data.

artificial intelligence, machine learning, treatment effect, (18 more...)

2405.19225

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-13-2023

Causal Discovery under Latent Class Confounding

Mazaheri, Bijan, Gordon, Spencer, Rabani, Yuval, Schulman, Leonard

Directed acyclic graphs are used to model the causal structure of a system. ``Causal discovery'' describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.

artificial intelligence, causal discovery, latent class confounding

2311.07454

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

arXiv.org Machine LearningSep-25-2023

Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Gordon, Spencer L., Jahn, Erik, Mazaheri, Bijan, Rabani, Yuval, Schulman, Leonard J.

We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/\zeta)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $\zeta$). The best known lower bound was $\exp(\Omega(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/\zeta)^{O(k)}$. We also extend the known lower bound of $e^{\Omega(k)}$ to match our upper bound across a broad range of $\zeta$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.

algorithm, artificial intelligence, machine learning, (16 more...)

2309.13993

Country:

North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceJul-31-2023

Causal Information Splitting: Engineering Proxy Features for Robustness to Distribution Shifts

Mazaheri, Bijan, Mastakouri, Atalanti, Janzing, Dominik, Hardt, Michaela

Statistical prediction models are often trained on data from different probability distributions than their eventual use cases. One approach to proactively prepare for these shifts harnesses the intuition that causal mechanisms should remain invariant between environments. Here we focus on a challenging setting in which the causal and anticausal variables of the target are unobserved. Leaning on information theory, we develop feature selection and engineering techniques for the observed downstream variables that act as proxies. We identify proxies that help to build stable models and moreover utilize auxiliary training tasks to answer counterfactual questions that extract stability-enhancing information from proxies. We demonstrate the effectiveness of our techniques on synthetic and real data.

artificial intelligence, information, machine learning, (17 more...)

2305.05832

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningDec-21-2021

Identifying Mixtures of Bayesian Network Distributions

Gordon, Spencer L., Mazaheri, Bijan, Rabani, Yuval, Schulman, Leonard J.

A Bayesian Network is a directed acyclic graph (DAG) on a set of $n$ random variables (identified with the vertices); a Bayesian Network Distribution (BND) is a probability distribution on the rv's that is Markovian on the graph. A finite mixture of such models is the projection on these variables of a BND on the larger graph which has an additional "hidden" (or "latent") random variable $U$, ranging in $\{1,\ldots,k\}$, and a directed edge from $U$ to every other vertex. Models of this type are fundamental to research in Causal Inference, where $U$ models a confounding effect. One extremely special case has been of longstanding interest in the theory literature: the empty graph. Such a distribution is simply a mixture of $k$ product distributions. A longstanding problem has been, given the joint distribution of a mixture of $k$ product distributions, to identify each of the product distributions, and their mixture weights. Our results are: (1) We improve the sample complexity (and runtime) for identifying mixtures of $k$ product distributions from $\exp(O(k^2))$ to $\exp(O(k \log k))$. This is almost best possible in view of a known $\exp(\Omega(k))$ lower bound. (2) We give the first algorithm for the case of non-empty graphs. The complexity for a graph of maximum degree $\Delta$ is $\exp(O(k(\Delta^2 + \log k)))$. (The above complexities are approximate and suppress dependence on secondary parameters.)

artificial intelligence, health & medicine, machine learning, (18 more...)

2112.11602

Country:

Europe (0.27)
North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceJul-14-2021

Expert Graphs: Synthesizing New Expertise via Collaboration

Mazaheri, Bijan, Jain, Siddharth, Bruck, Jehoshua

Consider multiple experts with overlapping expertise working on a classification problem under uncertain input. What constitutes a consistent set of opinions? How can we predict the opinions of experts on missing sub-domains? In this paper, we define a framework of to analyze this problem, termed "expert graphs." In an expert graph, vertices represent classes and edges represent binary opinions on the topics of their vertices. We derive necessary conditions for expert graph validity and use them to create "synthetic experts" which describe opinions consistent with the observed opinions of other experts. We show this framework to be equivalent to the well-studied linear ordering polytope. We show our conditions are not sufficient for describing all expert graphs on cliques, but are sufficient for cycles.

expert graph, health & medicine, oncology, (20 more...)

2107.07054

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningDec-28-2020

Source Identification for Mixtures of Product Distributions

Gordon, Spencer L., Mazaheri, Bijan, Rabani, Yuval, Schulman, Leonard J.

We give an algorithm for source identification of a mixture of $k$ product distributions on $n$ bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using $2^{O(k^2)} n^{O(k)}$ arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O'Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).

algorithm, artificial intelligence, machine learning, (18 more...)

2012.1454

Country:

Asia > Middle East > Israel (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

arXiv.org Machine LearningOct-23-2020

Robust Correction of Sampling Bias Using Cumulative Distribution Functions

Mazaheri, Bijan, Jain, Siddharth, Bruck, Jehoshua

Varying domains and biased datasets can lead to differences between the training and the target distributions, known as covariate shift. Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions. These techniques require parameter tuning and can be unstable across different datasets. We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution by a rigorous generalization of a recent idea proposed by Vapnik and Izmailov. Further, we show experimentally that our method is more robust in its predictions, is not reliant on parameter tuning and shows similar classification performance compared to the current state-of-the-art techniques on synthetic and real datasets.

artificial intelligence, dataset, health & medicine, (16 more...)

2010.12687

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

arXiv.org Machine LearningSep-7-2020

The Sparse Hausdorff Moment Problem, with Application to Topic Models

Gordon, Spencer, Mazaheri, Bijan, Schulman, Leonard J., Rabani, Yuval

We consider the problem of identifying, from its first $m$ noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on $m$ observable binary random variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing this with $m=2k$, which is the minimum $m$ for which verifying that the source is a $k$-mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models. We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables using a sample of size $\left(1/w_{\min}\right)^2 \cdot\left(1/\zeta\right)^{O(k)}$ and post-sampling runtime of only $O(k^{2+o(1)})$ arithmetic operations. Here $w_{\min}$ is the minimum probability of an outcome of $U$, and $\zeta$ is the minimum separation between the distinct success probabilities of the $X_i$s. Stated in terms of the moment problem, it suffices to know the moments to additive accuracy $w_{\min}\cdot\zeta^{O(k)}$. It is known that the sample complexity of any solution to the identification problem must be at least exponential in $k$. Previous results demonstrated either worse sample complexity and worse $O(k^c)$ runtime for some $c$ substantially larger than $2$, or similar sample complexity and much worse $k^{O(k^2)}$ runtime.

algorithm, artificial intelligence, natural language, (18 more...)

2007.08101

Country:

North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)