AITopics | Awasthi, Pranjal

Collaborating Authors

Awasthi, Pranjal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Awasthi, Pranjal, Tang, Alex, Vijayaraghavan, Aravindan

arXiv.org Machine LearningAug-1-2021

We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}\sigma({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $\sigma(t) := \max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.

algorithm, artificial intelligence, neural network, (17 more...)

arXiv.org Machine Learning

2107.10209

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the benefits of maximum likelihood estimation for Regression and Forecasting

Awasthi, Pranjal, Das, Abhimanyu, Sen, Rajat, Suresh, Ananda Theertha

arXiv.org Artificial IntelligenceJun-18-2021

We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.

dataset, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2106.1037

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective

Prost, Flavien, Awasthi, Pranjal, Blumm, Nick, Kumthekar, Aditee, Potter, Trevor, Wei, Li, Wang, Xuezhi, Chi, Ed H., Chen, Jilin, Beutel, Alex

arXiv.org Machine LearningMay-20-2021

In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts $\gamma$ and $\epsilon$. The first part $\gamma$ depends solely on the performance of the proxy such as precision and recall, whereas the second part $\epsilon$ captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by $\gamma$ via a linear dependence, whereas the dependence on the correlations $\epsilon$ only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.

artificial intelligence, data mining, estimation error, (17 more...)

arXiv.org Machine Learning

2105.09985

Country: North America > United States (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Finer Calibration Analysis for Adversarial Robustness

Awasthi, Pranjal, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Machine LearningMay-6-2021

W e present a more general analysis of H -calibration for adversarially robust classification. By adopting a finer definition of calibration, we can cover setti ngs beyond the restricted hypothesis sets studied in previous work. In particular, our results ho ld for most common hypothesis sets used in machine learning. W e both fix some previous calibration re sults ( Bao et al., 2020) and generalize others ( A wasthi et al., 2021). Moreover, our calibration results, combined with the pre vious study of consistency by A wasthi et al. ( 2021), also lead to more general H -consistency results covering common hypothesis sets.

artificial intelligence, hypothesis, machine learning, (16 more...)

arXiv.org Machine Learning

2105.0155

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Calibration and Consistency of Adversarial Surrogate Losses

Awasthi, Pranjal, Frank, Natalie, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Machine LearningMay-4-2021

Adversarial robustness is an increasingly critical property of classifiers in applications. The design of robust algorithms relies on surrogate losses since the optimization of the adversarial loss with most hypothesis sets is NP-hard. But which surrogate losses should be used and when do they benefit from theoretical guarantees? We present an extensive study of this question, including a detailed analysis of the H-calibration and H-consistency of adversarial surrogate losses. We show that, under some general assumptions, convex loss functions, or the supremum-based convex losses often used in applications, are not H-calibrated for important hypothesis sets such as generalized linear models or one-layer neural networks. We then give a characterization of H-calibration and prove that some surrogate losses are indeed H-calibrated for the adversarial loss, with these hypothesis sets. Next, we show that H-calibration is not sufficient to guarantee consistency and prove that, in the absence of any distributional assumption, no continuous surrogate loss is consistent in the adversarial setting. This, in particular, proves that a claim presented in a COLT 2020 publication is inaccurate. (Calibration results there are correct modulo subtle definition differences, but the consistency claim does not hold.) Next, we identify natural conditions under which some surrogate losses that we describe in detail are H-consistent for hypothesis sets such as generalized linear models and one-layer neural networks. We also report a series of empirical results with simulated data, which show that many H-calibrated surrogate losses are indeed not H-consistent, and validate our theoretical assumptions.

artificial intelligence, neural network, surrogate loss, (15 more...)

arXiv.org Machine Learning

2104.09658

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Multiclass Boosting Framework for Achieving Fast and Provable Adversarial Robustness

Abernethy, Jacob, Awasthi, Pranjal, Kale, Satyen

arXiv.org Machine LearningMar-3-2021

Alongside the well-publicized accomplishments of deep neural networks there has emerged an apparent bug in their success on tasks such as object recognition: with deep models trained using vanilla methods, input images can be slightly corrupted in order to modify output predictions, even when these corruptions are practically invisible. This apparent lack of robustness has led researchers to propose methods that can help to prevent an adversary from having such capabilities. The state-of-the-art approaches have incorporated the robustness requirement into the loss function, and the training process involves taking stochastic gradient descent steps not using original inputs but on adversarially-corrupted ones. In this paper we propose a multiclass boosting framework to ensure adversarial robustness. Boosting algorithms are generally well-suited for adversarial scenarios, as they were classically designed to satisfy a minimax guarantee. We provide a theoretical foundation for this methodology and describe conditions under which robustness can be achieved given a weak training oracle. We show empirically that adversarially-robust multiclass boosting not only outperforms the state-of-the-art methods, it does so at a fraction of the training time.

deep learning, neural network, predictor, (17 more...)

arXiv.org Machine Learning

2103.01276

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information

Awasthi, Pranjal, Beutel, Alex, Kleindessner, Matthaeus, Morgenstern, Jamie, Wang, Xuezhi

arXiv.org Machine LearningFeb-16-2021

Training and evaluation of fair classifiers is a challenging problem. This is partly due to the fact that most fairness metrics of interest depend on both the sensitive attribute information and label information of the data points. In many scenarios it is not possible to collect large datasets with such information. An alternate approach that is commonly used is to separately train an attribute classifier on data with sensitive attribute information, and then use it later in the ML pipeline to evaluate the bias of a given classifier. While such decoupling helps alleviate the problem of demographic scarcity, it raises several natural questions such as: how should the attribute classifier be trained?, and how should one use a given attribute classifier for accurate bias estimation? In this work we study this question from both theoretical and empirical perspectives. We first experimentally demonstrate that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model. In order to further investigate this phenomenon, we analyze an idealized theoretical model and characterize the structure of the optimal classifier. Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible among the different subgroups. Based on our analysis we develop heuristics for both training and using attribute classifiers for bias estimation in the data scarce regime. We empirically demonstrate the effectiveness of our approach on real and simulated data.

classifier, health & medicine, neural network, (18 more...)

arXiv.org Machine Learning

2102.0841

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Beyond Individual and Group Fairness

Awasthi, Pranjal, Cortes, Corinna, Mansour, Yishay, Mohri, Mehryar

arXiv.org Machine LearningAug-21-2020

Learning algorithms trained on large amounts of data are increasingly adopted in applications with significant individual and social consequences such as selecting loan applicants, filtering resumes of job applicants, estimating the likelihood for a defendant to commit future crimes, or deciding where to deploy police officers. Analyzing the risk of bias in these systems is therefore crucial. In fact, that is also critical for seemingly less socially consequential applications such as ads placement, recommendation systems, speech recognition, and many other common applications of machine learning. Such biases can appear due to the way the training data has been collected, due to an improper choice of the loss function optimized, or as a result of some other algorithmic choices.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

2008.0949

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Adversarial robustness via robust low rank representations

Awasthi, Pranjal, Jain, Himanshu, Rawat, Ankit Singh, Vijayaraghavan, Aravindan

arXiv.org Machine LearningAug-1-2020

Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. Our first contribution is for certified robustness to perturbations measured in $\ell_2$ norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100. Our second contribution is for the more challenging setting of certified robustness to perturbations measured in $\ell_\infty$ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to $\ell_\infty$ perturbations in those representations. Our certificate of $\ell_\infty$ robustness relies on a natural quantity involving the $\infty \to 2$ matrix operator norm associated with the representation, to translate robustness guarantees from $\ell_2$ to $\ell_\infty$ perturbations. A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.

neural network, optimization problem, robustness, (18 more...)

arXiv.org Machine Learning

2007.06555

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

On the Rademacher Complexity of Linear Hypothesis Sets

Awasthi, Pranjal, Frank, Natalie, Mohri, Mehryar

arXiv.org Machine LearningJul-21-2020

Linear predictors form a rich class of hypotheses used in a variety of learning algorithms. We present a tight analysis of the empirical Rademacher complexity of the family of linear hypothesis classes with weight vectors bounded in $\ell_p$-norm for any $p \geq 1$. This provides a tight analysis of generalization using these hypothesis sets and helps derive sharp data-dependent learning guarantees. We give both upper and lower bounds on the Rademacher complexity of these families and show that our bounds improve upon or match existing bounds, which are known only for $1 \leq p \leq 2$.

artificial intelligence, machine learning, rademacher complexity, (15 more...)

arXiv.org Machine Learning

2007.11045

Country:

North America > United States (0.15)
Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback