AITopics | Shetty, Abhishek

Collaborating Authors

Shetty, Abhishek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Smoothed Analysis of Sequential Probability Assignment

Bhatt, Alankrita, Haghtalab, Nika, Shetty, Abhishek

arXiv.org Artificial IntelligenceMar-8-2023

We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.

artificial intelligence, machine learning, probability assignment, (17 more...)

arXiv.org Artificial Intelligence

2303.04845

Genre: Research Report (0.64)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

The One-Inclusion Graph Algorithm is not Always Optimal

Aden-Ali, Ishaq, Cherapanamjeri, Yeshwanth, Shetty, Abhishek, Zhivotovskiy, Nikita

arXiv.org Artificial IntelligenceDec-19-2022

The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes. Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.0927

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.95)

Add feedback

Distribution Compression in Near-linear Time

Shetty, Abhishek, Dwivedi, Raaz, Mackey, Lester

arXiv.org Machine LearningNov-16-2021

In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.

artificial intelligence, health & medicine, machine learning, (17 more...)

arXiv.org Machine Learning

2111.07941

Country:

North America > United States > Virginia (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smoothed Analysis with Adaptive Adversaries

Haghtalab, Nika, Roughgarden, Tim, Shetty, Abhishek

arXiv.org Machine LearningFeb-16-2021

We prove novel algorithmic guarantees for several online problems in the smoothed analysis model. In this model, at each time an adversary chooses an input distribution with density function bounded above by $\tfrac{1}{\sigma}$ times that of the uniform distribution; nature then samples an input from this distribution. Crucially, our results hold for {\em adaptive} adversaries that can choose an input distribution based on the decisions of the algorithm and the realizations of the inputs in the previous time steps. This paper presents a general technique for proving smoothed algorithmic guarantees against adaptive adversaries, in effect reducing the setting of adaptive adversaries to the simpler case of oblivious adversaries. We apply this technique to prove strong smoothed guarantees for three problems: -Online learning: We consider the online prediction problem, where instances are generated from an adaptive sequence of $\sigma$-smooth distributions and the hypothesis class has VC dimension $d$. We bound the regret by $\tilde{O}\big(\sqrt{T d\ln(1/\sigma)} + d\sqrt{\ln(T/\sigma)}\big)$. This answers open questions of [RST11,Hag18]. -Online discrepancy minimization: We consider the online Koml\'os problem, where the input is generated from an adaptive sequence of $\sigma$-smooth and isotropic distributions on the $\ell_2$ unit ball. We bound the $\ell_\infty$ norm of the discrepancy vector by $\tilde{O}\big(\ln^2\!\big( \frac{nT}{\sigma}\big) \big)$. -Dispersion in online optimization: We consider online optimization of piecewise Lipschitz functions where functions with $\ell$ discontinuities are chosen by a smoothed adaptive adversary and show that the resulting sequence is $\big( {\sigma}/{\sqrt{T\ell}}, \tilde O\big(\sqrt{T\ell} \big)\big)$-dispersed. This matches the parameters of [BDV18] for oblivious adversaries, up to log factors.

adversary, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2102.08446

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Smoothed Analysis of Online and Differentially Private Learning

Haghtalab, Nika, Roughgarden, Tim, Shetty, Abhishek

arXiv.org Machine LearningJun-17-2020

Robustness to changes in the data and protecting the privacy of data are two of the main challenges faced by machine learning and have led to the design of online and differentially private learning algorithms. While offline PAC learnability is characterized by the finiteness of VC dimension, online and differentially private learnability are both characterized by the finiteness of the Littlestone dimension [Alon et al., 2019, Ben-David et al., 2009, Bun et al., 2020]. This latter characterization is often interpreted as an impossibility result for achieving robustness and privacy on worst-case instances, especially in classification where even simple hypothesis classes such as 1-dimensional thresholds have constant VC dimension but infinite Littlestone dimension. Impossibility results for worst-case adversaries do not invalidate the original goals of robust and private learning with respect to practically relevant hypothesis classes; rather, they indicate that a new model is required to provide rigorous guidance on the design of online and differentially private learning algorithms. In this work, we go beyond worst-case analysis and design online learning algorithms and differentially private learning algorithms as good as their offline and non-private PAC learning counterparts in a realistic semi-random model of data. Inspired by smoothed analysis [Spielman and Teng, 2004], we introduce frameworks for online and differentially private learning in which adversarially chosen inputs are perturbed slightly by nature (reflecting, e.g., measurement errors or uncertainty). Equivalently, we consider an adversary restricted to choose an input distribution that is not overly concentrated, with the realized input then drawn from the adversarys chosen distribution. Our goal is to design algorithms with good expected regret and error bounds, where the expectation is over natures perturbations (and any random coin flips of the algorithm). Our positive results show, in a precise sense, that the known lower bounds for worst-case online and differentially private learnability are fundamentally brittle.

adversary, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2006.10129

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Panigrahi, Abhishek, Shetty, Abhishek, Goyal, Navin

arXiv.org Machine LearningAug-16-2019

It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. The limiting case when the network size approaches infinity has also been considered. These results either assume that the activation function is ReLU or they crucially depend on the minimum eigenvalue of a certain Gram matrix depending on the data, random initialization and the activation function. In the latter case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds. On the empirical side, a contemporary line of investigations has proposed a number of alternative activation functions which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of activation functions on training. In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks. We show that for smooth activations, such as tanh and swish, the minimum eigenvalue can be exponentially small depending on the span of the dataset implying that the training can be very slow. In contrast, for activations with a "kink," such as ReLU, SELU, ELU, all eigenvalues are large under minimal assumptions on the data. Several new ideas are involved. Finally, we corroborate our results empirically.

artificial intelligence, neural network, null, (19 more...)

arXiv.org Machine Learning

1908.0566

Country:

Europe (1.00)
North America > United States > California (0.27)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Non-Gaussian Component Analysis using Entropy Methods

Goyal, Navin, Shetty, Abhishek

arXiv.org Machine LearningAug-15-2018

Non-Gaussian component analysis (NGCA) is a problem in multidimensional data analysis. Since its formulation in 2006, NGCA has attracted considerable attention in statistics and machine learning. In this problem, we have a random variable $X$ in $n$-dimensional Euclidean space. There is an unknown subspace $U$ of the $n$-dimensional Euclidean space such that the orthogonal projection of $X$ onto $U$ is standard multidimensional Gaussian and the orthogonal projection of $X$ onto $V$, the orthogonal complement of $U$, is non-Gaussian, in the sense that all its one-dimensional marginals are different from the Gaussian in a certain metric defined in terms of moments. The NGCA problem is to approximate the non-Gaussian subspace $V$ given samples of $X$. Vectors in $V$ corresponds to "interesting" directions, whereas vectors in $U$ correspond to the directions where data is very noisy. The most interesting applications of the NGCA model is for the case when the magnitude of the noise is comparable to that of the true signal, a setting in which traditional noise reduction techniques such as PCA don't apply directly. NGCA is also related to dimensionality reduction and to other data analysis problems such as ICA. NGCA-like problems have been studied in statistics for a long time using techniques such as projection pursuit. We give an algorithm that takes polynomial time in the dimension $n$ and has an inverse polynomial dependence on the error parameter measuring the angle distance between the non-Gaussian subspace and the subspace output by the algorithm. Our algorithm is based on relative entropy as the contrast function and fits under the projection pursuit framework. The techniques we develop for analyzing our algorithm maybe of use for other related problems.

artificial intelligence, optimization problem, random variable, (20 more...)

arXiv.org Machine Learning

1807.04936

Country:

North America > United States (0.45)
North America > Canada (0.27)

Genre:

Research Report (0.49)
Workflow (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback