AITopics | Balasubramanian, Krishnakumar

Collaborating Authors

Balasubramanian, Krishnakumar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds

Guan, Yunrui, Balasubramanian, Krishnakumar, Ma, Shiqian

arXiv.org Machine LearningFeb-11-2025

We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold Brownian Increments (MBI) oracle and the Riemannian Heat-kernel (RHK) oracle. We establish high-accuracy sampling guarantees for the Riemannian Proximal Sampler, showing that generating samples with $\varepsilon$-accuracy requires $O(\log(1/\varepsilon))$ iterations in Kullback-Leibler divergence assuming access to exact oracles and $O(\log^2(1/\varepsilon))$ iterations in the total variation metric assuming access to sufficiently accurate inexact oracles. Furthermore, we present practical implementations of these oracles by leveraging heat-kernel truncation and Varadhan's asymptotics. In the latter case, we interpret the Riemannian Proximal Sampler as a discretization of the entropy-regularized Riemannian Proximal Point Method on the associated Wasserstein space. We provide preliminary numerical results that illustrate the effectiveness of the proposed methodology.

artificial intelligence, machine learning, manifold, (19 more...)

arXiv.org Machine Learning

2502.07265

Country:

North America > United States > California (0.14)
Europe > France (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)
Information Technology > Mathematics of Computing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators

Tan, Yan Shuo, Klusowski, Jason M., Balasubramanian, Krishnakumar

arXiv.org Machine LearningNov-18-2024

Models based on recursive adaptive partitioning such as decision trees and their ensembles are popular for high-dimensional regression as they can potentially avoid the curse of dimensionality. Because empirical risk minimization (ERM) is computationally infeasible, these models are typically trained using greedy algorithms. Although effective in many cases, these algorithms have been empirically observed to get stuck at local optima. We explore this phenomenon in the context of learning sparse regression functions over $d$ binary features, showing that when the true regression function $f^*$ does not satisfy Abbe et al. (2022)'s Merged Staircase Property (MSP), greedy training requires $\exp(\Omega(d))$ to achieve low estimation error. Conversely, when $f^*$ does satisfy MSP, greedy training can attain small estimation error with only $O(\log d)$ samples. This dichotomy mirrors that of two-layer neural networks trained with stochastic gradient descent (SGD) in the mean-field regime, thereby establishing a head-to-head comparison between SGD-trained neural networks and greedy recursive partitioning estimators. Furthermore, ERM-trained recursive partitioning estimators achieve low estimation error with $O(\log d)$ samples irrespective of whether $f^*$ satisfies MSP, thereby demonstrating a statistical-computational trade-off for greedy training. Our proofs are based on a novel interpretation of greedy recursive partitioning using stochastic process theory and a coupling technique that may be of independent interest.

artificial intelligence, machine learning, recursive adaptive partitioning estimator, (1 more...)

arXiv.org Machine Learning

2411.04394

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Provable In-context Learning for Mixture of Linear Regressions using Transformers

Jin, Yanhao, Balasubramanian, Krishnakumar, Lai, Lifeng

arXiv.org Machine LearningOct-18-2024

We theoretically investigate the in-context learning capabilities of transformers in the context of learning mixtures of linear regression models. For the case of two mixtures, we demonstrate the existence of transformers that can achieve an accuracy, relative to the oracle predictor, of order $\mathcal{\tilde{O}}((d/n)^{1/4})$ in the low signal-to-noise ratio (SNR) regime and $\mathcal{\tilde{O}}(\sqrt{d/n})$ in the high SNR regime, where $n$ is the length of the prompt, and $d$ is the dimension of the problem. Additionally, we derive in-context excess risk bounds of order $\mathcal{O}(L/\sqrt{B})$, where $B$ denotes the number of (training) prompts, and $L$ represents the number of attention layers. The order of $L$ depends on whether the SNR is low or high. In the high SNR regime, we extend the results to $K$-component mixture models for finite $K$. Extensive simulations also highlight the advantages of transformers for this task, outperforming other baselines such as the Expectation-Maximization algorithm.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Machine Learning

2410.14183

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Transformers Handle Endogeneity in In-Context Linear Regression

Liang, Haodong, Balasubramanian, Krishnakumar, Lai, Lifeng

arXiv.org Machine LearningOct-2-2024

We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2410.01265

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)

Add feedback

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Chen, Xuxing, Roy, Abhishek, Hu, Yifan, Balasubramanian, Krishnakumar

arXiv.org Machine LearningMay-29-2024

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\iota})$ for any $\iota>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

2405.19463

Country:

North America > United States > California (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Separation in Heavy-Tailed Sampling: Gaussian vs. Stable Oracles for Proximal Samplers

He, Ye, Mousavi-Hosseini, Alireza, Balasubramanian, Krishnakumar, Erdogdu, Murat A.

arXiv.org Machine LearningMay-26-2024

We study the complexity of heavy-tailed sampling and present a separation result in terms of obtaining high-accuracy versus low-accuracy guarantees i.e., samplers that require only $O(\log(1/\varepsilon))$ versus $\Omega(\text{poly}(1/\varepsilon))$ iterations to output a sample which is $\varepsilon$-close to the target in $\chi^2$-divergence. Our results are presented for proximal samplers that are based on Gaussian versus stable oracles. We show that proximal samplers based on the Gaussian oracle have a fundamental barrier in that they necessarily achieve only low-accuracy guarantees when sampling from a class of heavy-tailed targets. In contrast, proximal samplers based on the stable oracle exhibit high-accuracy guarantees, thereby overcoming the aforementioned limitation. We also prove lower bounds for samplers under the stable oracle and show that our upper bounds cannot be fundamentally improved.

artificial intelligence, machine learning, proximal sampler, (17 more...)

arXiv.org Machine Learning

2405.16736

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Minimax Optimal Goodness-of-Fit Testing with Kernel Stein Discrepancy

Hagrass, Omar, Sriperumbudur, Bharath, Balasubramanian, Krishnakumar

arXiv.org Machine LearningApr-12-2024

We explore the minimax optimality of goodness-of-fit tests on general domains using the kernelized Stein discrepancy (KSD). The KSD framework offers a flexible approach for goodness-of-fit testing, avoiding strong distributional assumptions, accommodating diverse data structures beyond Euclidean spaces, and relying only on partial knowledge of the reference distribution, while maintaining computational efficiency. We establish a general framework and an operator-theoretic representation of the KSD, encompassing many existing KSD tests in the literature, which vary depending on the domain. We reveal the characteristics and limitations of KSD and demonstrate its non-optimality under a certain alternative space, defined over general domains when considering $\chi^2$-divergence as the separation metric. To address this issue of non-optimality, we propose a modified, minimax optimal test by incorporating a spectral regularizer, thereby overcoming the shortcomings of standard KSD tests. Our results are established under a weak moment condition on the Stein kernel, which relaxes the bounded kernel assumption required by prior work in the analysis of kernel-based hypothesis testing. Additionally, we introduce an adaptive test capable of achieving minimax optimality up to a logarithmic factor by adapting to unknown parameters. Through numerical experiments, we illustrate the superior performance of our proposed tests across various domains compared to their unregularized counterparts.

artificial intelligence, machine learning, nulla 1 2, (16 more...)

arXiv.org Machine Learning

2404.08278

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation

Jin, Yanhao, Balasubramanian, Krishnakumar, Paul, Debashis

arXiv.org Machine LearningMar-27-2024

Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this setting is that the covariance structure of the random regression coefficients could be leveraged to make better predictions on new tasks. Accordingly, we first characterize the precise asymptotic behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We next show that this predictive risk is optimal when the weight matrix in generalized ridge regression is chosen to be the inverse of the covariance matrix of random coefficients. Finally, we propose and analyze an estimator of the inverse covariance matrix of random regression coefficients based on data from the training tasks. As opposed to intractable MLE-type estimators, the proposed estimators could be computed efficiently as they could be obtained by solving (global) geodesically-convex optimization problems. Our analysis and methodology use tools from random matrix theory and Riemannian optimization. Simulation results demonstrate the improved generalization performance of the proposed method on new unseen test tasks within the considered framework.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2403.1972

Country:

North America > United States > Tennessee (0.14)
North America > United States > California (0.13)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Multivariate Gaussian Approximation for Random Forest via Region-based Stabilization

Shi, Zhaoyang, Bhattacharjee, Chinmoy, Balasubramanian, Krishnakumar, Polonik, Wolfgang

arXiv.org Machine LearningMar-14-2024

We derive Gaussian approximation bounds for random forest predictions based on a set of training points given by a Poisson process, under fairly mild regularity assumptions on the data generating process. Our approach is based on the key observation that the random forest predictions satisfy a certain geometric property called region-based stabilization. In the process of developing our results for the random forest, we also establish a probabilistic result, which might be of independent interest, on multivariate Gaussian approximation bounds for general functionals of Poisson process that are region-based stabilizing. This general result makes use of the Malliavin-Stein method, and is potentially applicable to various related statistical problems.

artificial intelligence, machine learning, rect, (18 more...)

arXiv.org Machine Learning

2403.0996

Country: North America > United States > California (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Nonsmooth Nonparametric Regression via Fractional Laplacian Eigenmaps

Shi, Zhaoyang, Balasubramanian, Krishnakumar, Polonik, Wolfgang

arXiv.org Machine LearningFeb-22-2024

Laplacian based nonparametric regression is a widely used approach in machine learning that leverages the Laplacian Eigenmaps algorithm to perform regression tasks without relying on explicit parametric models. The nonparametric nature of the approach makes it flexible and adaptable to data generating process without imposing strict assumptions about the functional form of the relationship between the response and the covariates. Existing theoretical studies of this approach are restricted to establishing minimax rates of convergence and adaptivity properties when the true regression function lies in Sobolev spaces; see Section 1.1 for details. Such spaces are inherently smooth in nature and exclude important function classes in nonparametric statistics, such as piecewise constant or polynomial functions, bump functions and other such nonsmooth function classes. In this work, using the framework of fractional Laplacians, we propose a novel approach called Principal Component Regression using Fractional Laplacian Eigenmaps (PCR-FLE) for nonsmooth and nonparametric regression. The PCR-FLE algorithm generalizes the PCR-LE algorithm by Green et al. (2023) and the PCR-WLE algorithm by Shi et al. (2024), and is designed to naturally handle the case when the true regression function lies in an L

artificial intelligence, machine learning, sobolev space, (19 more...)

arXiv.org Machine Learning

2402.14985

Country: North America > United States > California (0.14)

Genre:

Research Report (0.84)
Instructional Material > Course Syllabus & Notes (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)

Add feedback