AITopics | Sen, Subhabrata

Collaborating Authors

Sen, Subhabrata

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models

Jones-McCormick, Taj, Jagannath, Aukosh, Sen, Subhabrata

arXiv.org Machine LearningFeb-24-2025

Unsupervised pre-training and transfer learning are commonly used techniques to initialize training algorithms for neural networks, particularly in settings with limited labeled data. In this paper, we study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning. Specifically, we consider the problem of training a single-layer neural network via online stochastic gradient descent. We establish that pre-training and transfer learning (under concept shift) reduce sample complexity by polynomial factors (in the dimension) under very general assumptions. We also uncover some surprising settings where pre-training grants exponential improvement over random initialization in terms of sample complexity.

artificial intelligence, initialization, machine learning, (17 more...)

arXiv.org Machine Learning

2502.16849

Country:

North America > United States (0.46)
North America > Canada > Ontario (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Fundamental limits of community detection from multi-view data: multi-layer, dynamic and partially labeled block models

Yang, Xiaodong, Lin, Buyu, Sen, Subhabrata

arXiv.org Machine LearningJan-16-2024

Multi-view data arises frequently in modern network analysis e.g. relations of multiple types among individuals in social network analysis, longitudinal measurements of interactions among observational units, annotated networks with noisy partial labeling of vertices etc. We study community detection in these disparate settings via a unified theoretical framework, and investigate the fundamental thresholds for community recovery. We characterize the mutual information between the data and the latent parameters, provided the degrees are sufficiently large. Based on this general result, (i) we derive a sharp threshold for community detection in an inhomogeneous multilayer block model \citep{chen2022global}, (ii) characterize a sharp threshold for weak recovery in a dynamic stochastic block model \citep{matias2017statistical}, and (iii) identify the limiting mutual information in an unbalanced partially labeled block model. Our first two results are derived modulo coordinate-wise convexity assumptions on specific functions -- we provide extensive numerical evidence for their correctness. Finally, we introduce iterative algorithms based on Approximate Message Passing for community detection in these problems.

data mining, information, machine learning, (19 more...)

arXiv.org Machine Learning

2401.08167

Country: North America > United States (0.14)

Genre: Research Report (0.63)

Industry: Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Bayes optimal learning in high-dimensional linear regression with network side information

Nandy, Sagnik, Sen, Subhabrata

arXiv.org Machine LearningOct-31-2023

Supervised learning problems with side information in the form of a network arise frequently in applications in genomics, proteomics and neuroscience. For example, in genetic applications, the network side information can accurately capture background biological information on the intricate relations among the relevant genes. In this paper, we initiate a study of Bayes optimal learning in high-dimensional linear regression with network side information. To this end, we first introduce a simple generative model (called the Reg-Graph model) which posits a joint distribution for the supervised data and the observed network through a common set of latent parameters. Next, we introduce an iterative algorithm based on Approximate Message Passing (AMP) which is provably Bayes optimal under very general conditions. In addition, we characterize the limiting mutual information between the latent signal and the data observed, and thus precisely quantify the statistical impact of the network side information. Finally, supporting numerical experiments suggest that the introduced algorithm has excellent performance in finite samples.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Machine Learning

2306.05679

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (0.66)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

A Mean Field Approach to Empirical Bayes Estimation in High-dimensional Linear Regression

Mukherjee, Sumit, Sen, Bodhisattva, Sen, Subhabrata

arXiv.org Machine LearningOct-25-2023

We study empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We establish asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc. Our analysis covers both deterministic and random designs, and accommodates correlations among the features. To the best of our knowledge, this provides the first rigorous nonparametric empirical Bayes method in a high-dimensional regression setting without sparsity.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2309.16843

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models: Sharp Minimax Rates

Chhor, Julien, Mukherjee, Rajarshi, Sen, Subhabrata

arXiv.org Machine LearningAug-1-2023

Given a heterogeneous Gaussian sequence model with unknown mean $\theta \in \mathbb R^d$ and known covariance matrix $\Sigma = \operatorname{diag}(\sigma_1^2,\dots, \sigma_d^2)$, we study the signal detection problem against sparse alternatives, for known sparsity $s$. Namely, we characterize how large $\epsilon^*>0$ should be, in order to distinguish with high probability the null hypothesis $\theta=0$ from the alternative composed of $s$-sparse vectors in $\mathbb R^d$, separated from $0$ in $L^t$ norm ($t \in [1,\infty]$) by at least $\epsilon^*$. We find minimax upper and lower bounds over the minimax separation radius $\epsilon^*$ and prove that they are always matching. We also derive the corresponding minimax tests achieving these bounds. Our results reveal new phase transitions regarding the behavior of $\epsilon^*$ with respect to the level of sparsity, to the $L^t$ metric, and to the heteroscedasticity profile of $\Sigma$. In the case of the Euclidean (i.e. $L^2$) separation, we bridge the remaining gaps in the literature.

artificial intelligence, minimax separation, separation, (16 more...)

arXiv.org Machine Learning

2211.0858

Genre: Research Report > Experimental Study (0.45)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond

Jiang, Kuanhao, Mukherjee, Rajarshi, Sen, Subhabrata, Sur, Pragya

arXiv.org Machine LearningOct-28-2022

Estimation of the average treatment effect (ATE) is a central problem in causal inference. In recent times, inference for the ATE in the presence of high-dimensional covariates has been extensively studied. Among the diverse approaches that have been proposed, augmented inverse probability weighting (AIPW) with cross-fitting has emerged a popular choice in practice. In this work, we study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime where the number of features and samples are both large and comparable. Under assumptions on the covariate distribution, we establish a new central limit theorem for the suitably scaled cross-fit AIPW that applies without any sparsity assumptions on the underlying high-dimensional parameters. Our CLT uncovers two crucial phenomena among others: (i) the AIPW exhibits a substantial variance inflation that can be precisely quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the pre-cross-fit estimators is non-negligible even on the root-n scale. These findings are strikingly different from their classical counterparts. On the technical front, our work utilizes a novel interplay between three distinct tools--approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach. We believe our proof techniques should be useful for analyzing other two-stage estimators in this high-dimensional regime. Finally, we complement our theoretical results with simulations that demonstrate both the finite sample efficacy of our CLT and its robustness to our assumptions.

artificial intelligence, machine learning, null 1, (17 more...)

arXiv.org Machine Learning

2205.10198

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

The TAP free energy for high-dimensional linear regression

Qiu, Jiaze, Sen, Subhabrata

arXiv.org Machine LearningMar-14-2022

The analysis of high-dimensional probability distributio ns is a central challenge in modern Statistics and Machine Learning. This i s particularly true in the context of Bayesian Statistics, where scientists carry out inferen ce based on the posterior distribution. In modern applications, the posterior distribution is typi cally high-dimensional, and analytically intractable. V ariational Inference (VI) has emerge d as an attractive option to approximate these intractable distributions, facilitating fast, parallel computations in state-of-the-art applications [ 32, 10 ]. In this approach, the distribution of interest is approxi mated (in KL divergence) by distributions from a pre-specified, more tract able collection. The simplest version of VI is the Naive Mean-field approximation (NMF), where the distribution of interest is approximated by a product distribution.

artificial intelligence, machine learning, nullr 1, (16 more...)

arXiv.org Machine Learning

2203.07539

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

Variational Inference in high-dimensional linear regression

Mukherjee, Sumit, Sen, Subhabrata

arXiv.org Machine LearningApr-25-2021

We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo,2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional "separation" condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this "separation" condition. Finally, we illustrate our results on concrete examples with specific design matrices.

approximation, bayesian inference, survey article, (19 more...)

arXiv.org Machine Learning

2104.12232

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Contextual Stochastic Block Model: Sharp Thresholds and Contiguity

Lu, Chen, Sen, Subhabrata

arXiv.org Machine LearningNov-15-2020

In the simplest version of this problem, given access to a graph, one seeks to cluster the vertices into interpretable communities or groups of vertices, which are believed to reflect latent similarities among the nodes. From a theoretical standpoint, this problem has been extensively analyzed under specific generative assumptions on the observed graph; the most popular generative model in this context is the stochastic block model (SBM) [22]. Inspired by intriguing conjectures arising from the statistical physics community [29], community detection under the stochastic block model has been studied extensively. As a consequence, the precise information theoretic limits for recovering the underlying communities have been derived, and optimal algorithms have been identified in this setting (for a survey of these recent breakthroughs, see [1]). In reality, the practitioner often has access to additional information in the form of node covariates, which complements the graph information.

artificial intelligence, null, survey article, (20 more...)

arXiv.org Machine Learning

2011.09841

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Contextual Stochastic Block Models

Deshpande, Yash, Sen, Subhabrata, Montanari, Andrea, Mossel, Elchanan

Neural Information Processing SystemsDec-31-2018

We provide the first information theoretical tight analysis for inference of latent community structure given a sparse graph along with high dimensional node covariates, correlated with the same latent communities. Our work bridges recent theoretical breakthroughs in detection of latent community structure without nodes covariates and a large body of empirical work using diverse heuristics for combining node covariates with graphs for inference. The tightness of our analysis implies in particular, the information theoretic necessity of combining the different sources of information. Our analysis holds for networks of large degrees as well as for a Gaussian version of the model.

data mining, information, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback