AITopics | A., Prashanth L.

Collaborating Authors

A., Prashanth L.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms

Tatlı, Meltem, Mukherjee, Arpan, A., Prashanth L., Shanmugam, Karthikeyan, Tajer, Ali

arXiv.org Machine LearningMar-11-2025

This paper introduces a general framework for risk-sensitive bandits that integrates the notions of risk-sensitive objectives by adopting a rich class of distortion riskmetrics. The introduced framework subsumes the various existing risk-sensitive models. An important and hitherto unknown observation is that for a wide range of riskmetrics, the optimal bandit policy involves selecting a mixture of arms. This is in sharp contrast to the convention in the multi-arm bandit algorithms that there is generally a solitary arm that maximizes the utility, whether purely reward-centric or risk-sensitive. This creates a major departure from the principles for designing bandit algorithms since there are uncountable mixture possibilities. The contributions of the paper are as follows: (i) it formalizes a general framework for risk-sensitive bandits, (ii) identifies standard risk-sensitive bandit models for which solitary arm selections is not optimal, (iii) and designs regret-efficient algorithms whose sampling strategies can accurately track optimal arm mixtures (when mixture is optimal) or the solitary arms (when solitary is optimal). The algorithms are shown to achieve a regret that scales according to $O((\log T/T )^{\nu})$, where $T$ is the horizon, and $\nu>0$ is a riskmetric-specific constant.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

2503.08896

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

VaR\ and CVaR Estimation in a Markov Cost Process: Lower and Upper Bounds

Bhat, Sanjay, A., Prashanth L., Thoppe, Gugan

arXiv.org Machine LearningOct-17-2023

We tackle the problem of estimating the Value-at-Risk (VaR) and the Conditional Value-at-Risk (CVaR) of the infinite-horizon discounted cost within a Markov cost process. First, we derive a minimax lower bound of $\Omega(1/\sqrt{n})$ that holds both in an expected and in a probabilistic sense. Then, using a finite-horizon truncation scheme, we derive an upper bound for the error in CVaR estimation, which matches our lower bound up to constant factors. Finally, we discuss an extension of our estimation scheme that covers more general risk measures satisfying a certain continuity criterion, e.g., spectral risk measures, utility-based shortfall risk. To the best of our knowledge, our work is the first to provide lower and upper bounds on the estimation error for any risk measure within Markovian settings. We remark that our lower bounds also extend to the infinite-horizon discounted costs' mean. Even in that case, our result $\Omega(1/\sqrt{n}) $ improves upon the existing result $\Omega(1/n)$[13].

artificial intelligence, machine learning, risk measure, (15 more...)

arXiv.org Machine Learning

2310.11389

Country: Asia > India (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Patil, Gandharv, A., Prashanth L., Nagaraj, Dheeraj, Precup, Doina

arXiv.org Machine LearningSep-11-2023

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2210.05918

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

Mondal, Akash, A., Prashanth L., Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceJun-30-2023

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an epsilon-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives.

artificial intelligence, machine learning, null 2, (18 more...)

arXiv.org Artificial Intelligence

2208.0029

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

Maniyar, Mizhaan Prajit, Mondal, Akash, A., Prashanth L., Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceApr-21-2023

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an $\epsilon$-SOSP is $O(\epsilon^{-3.5})$, which is an improvement over the state-of-the-art sample complexity of $O(\epsilon^{-4.5})$.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2304.10951

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

A Survey of Risk-Aware Multi-Armed Bandits

Tan, Vincent Y. F., A., Prashanth L., Jagannathan, Krishna

arXiv.org Machine LearningMay-11-2022

There are two general does not satisfactorily capture the merits sub-problems in the MAB literature, namely, regret of a drug or a portfolio. In such applications, minimization and best-arm identification (also risk plays a crucial role, and a riskaware called pure exploration). In the former, in which performance measure is preferable, the exploration-exploitation trade-off manifests, the so as to capture losses in the case of adverse player wants to maximize his reward over a fixed events. This survey aims to consolidate time period. In the latter, the player simply wants and summarise the existing research to learn which arm is the best in either the quickest on risk measures, specifically in the context time possible with a given probability of success(the of multi-armed bandits. We review fixed-confidence setting) or he wants to do so with various risk measures of interest, and comment the highest probability of success given a fixed playing on their properties. Next, we review horizon (the fixed budget setting). In most of existing concentration inequalities for various the MAB literature (see Lattimore and Szepesvári risk measures. Then, we proceed to (2020) for an up-to-date survey), the metric of interest defining risk-aware bandit problems, We is defined simply as the mean of the reward consider algorithms for the regret minimization distribution associated with the arm pulled.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2205.05843

Genre: Overview (1.00)

Industry:

Education (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Online Estimation and Optimization of Utility-Based Shortfall Risk

Menon, Arvind S., A., Prashanth L., Jagannathan, Krishna

arXiv.org Machine LearningNov-16-2021

In several financial applications, it is necessary to understand risk sensitivity while maximizing the returns. Several risk measures have been studied in the literature, e.g., mean-variance, Value at Risk (VaR), Conditional Value at Risk (CVaR), distorted risk measure, and prospect theory. In [2], the authors consider four properties as desirable for a risk measure, namely positive homogeneity, translation invariance, sub-additivity, and monotonicity. They define a risk measure as being coherent if it possesses the aforementioned properties. In a related development, in[19], the authors chose to relax the sub-additivity and positive homogeneity requirements of a coherent risk measure, and instead impose a convexity condition on the underlying risk measure.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

2111.08805

Country: Asia > India (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

Hu, Xiaowei, A., Prashanth L., György, András, Szepesvári, Csaba

arXiv.org Machine LearningJul-4-2020

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In this paper we propose a novel framework that replaces the specific gradient estimation methods with an abstract oracle. With the help of the new framework we unify previous works, reproducing their results in a clean and concise fashion, while, perhaps more importantly, the framework also allows us to formally show that to achieve the optimal root-$n$ rate either the algorithms that use existing gradient estimators, or the proof techniques used to analyze them have to go beyond what exists today.

educational setting, optimization problem, oracle, (17 more...)

arXiv.org Machine Learning

1609.07087

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.81)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint

A., Prashanth L., Fu, Michael

arXiv.org Machine LearningOct-22-2018

The classic objective in a reinforcement learning (RL) problem is to find a policy that minimizes, in expectation, a long-run objective such as the infinite-horizon discounted or long-run average cost. In many practical applications, optimizing the expected value alone is not sufficient, and it may be necessary to include a risk measure in the optimization process, either as the objective or as a constraint. Various risk measures have been proposed in the literature, e.g., mean-variance tradeoff, exponential utility, the percentile performance, value at risk, conditional value at risk, prospect theory and its later enhancement, cumulative prospect theory. In this article, we focus on the combination of risk criteria and reinforcement learning in a constrained optimization framework, i.e., a setting where the goal to find a policy that optimizes the usual objective of infinite-horizon discounted/average cost, while ensuring that an explicit risk constraint is satisfied. We introduce the risk-constrained RL framework, cover popular risk measures based on variance, conditional value-at-risk and cumulative prospect theory, and present a template for a risk-sensitive RL algorithm. We survey some of our recent work on this topic, covering problems encompassing discounted cost, average cost, and stochastic shortest path settings, together with the aforementioned risk measures in a constrained framework. This non-exhaustive survey is aimed at giving a flavor of the challenges involved in solving a risk-sensitive RL problem, and outlining some potential future research directions.

algorithm, optimization problem, survey article, (18 more...)

arXiv.org Machine Learning

1810.09126

Country:

North America > United States > Maryland (0.14)
North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Concentration bounds for empirical conditional value-at-risk: The unbounded case

Kolla, Ravi Kumar, A., Prashanth L., Bhat, Sanjay P., Jagannathan, Krishna

arXiv.org Machine LearningAug-6-2018

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbounded random variable, which is either sub-Gaussian or sub-exponential. We derive a novel one-sided concentration bound for a natural sample-based CVaR estimator in this setting. Our bound relies on a concentration result for a quantile-based estimator for Value-at-Risk (VaR), which may be of independent interest.

artificial intelligence, concentration, data mining, (16 more...)

arXiv.org Machine Learning

1808.01739

Country: Asia > India (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback