AITopics | Gajane, Pratik

Collaborating Authors

Gajane, Pratik

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain

Gajane, Pratik, Newman, Sean, Pechenizkiy, Mykola, Piette, John D.

arXiv.org Artificial IntelligenceJun-14-2024

Chronic pain significantly diminishes the quality of life for millions worldwide. While psychoeducation and therapy can improve pain outcomes, many individuals experiencing pain lack access to evidence-based treatments or fail to complete the necessary number of sessions to achieve benefit. Reinforcement learning (RL) shows potential in tailoring personalized pain management interventions according to patients' individual needs while ensuring the efficient use of scarce clinical resources. However, clinicians, patients, and healthcare decision-makers are concerned that RL solutions could exacerbate disparities associated with patient characteristics like race or gender. In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a). Here, adhering to gender fairness translates to minimal or no disparity in the utility received by subpopulations as defined by gender. We investigate whether the selection of relevant patient information (referred to as features) used to assist decision-making affects gender fairness. Our experiments, conducted using real-world data (Piette, 2022), indicate that included features can impact gender fairness. Moreover, we propose an RL solution, NestedRecommendation, that demonstrates the ability: i) to adaptively learn to select the features that optimize for utility and fairness, and ii) to accelerate feature selection and in turn, improve pain care recommendations from early on, by leveraging clinicians' domain expertise.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2402.19226

Country:

Europe (0.68)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Consumer Health (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

Provodin, Danil, Gajane, Pratik, Pechenizkiy, Mykola, Kaptein, Maurits

arXiv.org Artificial IntelligenceSep-27-2023

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any communicating CMDP with S states, A actions, and bound on the hitting time H. This regret bound matches the lower bound in order of time horizon T and is the best-known regret bound for communicating CMDPs in the infinite-horizon undiscounted setting. Empirical results show that, despite its simplicity, our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.

artificial intelligence, constrained reinforcement learning, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2309.15737

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards

Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik

arXiv.org Artificial IntelligenceMar-1-2023

Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. Motivated by such scenarios, we propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. To formalize how feedback about a decision is partitioned across several time steps, we introduce $\beta$-spread property. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem. Moreover, we provide an algorithm called TP-UCB-FR-G and prove an upper bound on its performance measure. In some scenarios, our upper bound improves upon the state of the art. We provide experimental results validating the proposed algorithm and our theoretical results.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2303.0062

Genre:

Research Report (1.00)
Instructional Material > Online (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Curiosity-driven Exploration in Sparse-reward Multi-agent Reinforcement Learning

Li, Jiong, Gajane, Pratik

arXiv.org Artificial IntelligenceFeb-21-2023

Sparsity of rewards while applying a deep reinforcement learning method negatively affects its sample-efficiency. A viable solution to deal with the sparsity of rewards is to learn via intrinsic motivation which advocates for adding an intrinsic reward to the reward function to encourage the agent to explore the environment and expand the sample space. Though intrinsic motivation methods are widely used to improve data-efficient learning in the reinforcement learning model, they also suffer from the so-called detachment problem. In this article, we discuss the limitations of intrinsic curiosity module in sparse-reward multi-agent reinforcement learning and propose a method called I-Go-Explore that combines the intrinsic curiosity module with the Go-Explore framework to alleviate the detachment problem.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.10825

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Local Differential Privacy for Sequential Decision Making in a Changing Environment

Gajane, Pratik

arXiv.org Artificial IntelligenceJan-2-2023

We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment. We consider abruptly changing environment: the environment remains constant during periods and it changes at unknown time instants. To formulate this problem, we propose a variant of multi-armed bandits called non-stationary stochastic corrupt bandits. We construct an algorithm called SW-KLUCB-CF and prove an upper bound on its utility using the performance measure of regret. The proven regret upper bound for SW-KLUCB-CF is near-optimal in the number of time steps and matches the best known bound for analogous problems in terms of the number of time steps and the number of changes. Moreover, we present a provably optimal mechanism which can guarantee the desired level of local differential privacy while providing high utility.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2301.00561

Country:

North America > United States (0.28)
Europe > Netherlands (0.28)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Siecker, Luc, Verbeeke, Nina, Gajane, Pratik

arXiv.org Artificial IntelligenceNov-13-2022

We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of applications.

artificial intelligence, big data, data mining, (18 more...)

arXiv.org Artificial Intelligence

2211.06883

Country: Europe > Netherlands (0.16)

Genre:

Research Report (0.50)
Instructional Material > Online (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.93)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

The Impact of Batch Learning in Stochastic Linear Bandits

Provodin, Danil, Gajane, Pratik, Pechenizkiy, Mykola, Kaptein, Maurits

arXiv.org Artificial IntelligenceSep-1-2022

We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. That is to say, we provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior. Primarily, we study two settings of the stochastic linear bandits: bandits with finitely and infinitely many arms. While the regret bounds are the same for both settings, the former setting results hold under milder assumptions. Also, we provide a more robust result for the 2-armed bandit problem as an important insight. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on optimal batch size choice.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2202.06657

Country: Europe > Netherlands (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.88)

Add feedback

The Impact of Batch Learning in Stochastic Bandits

Provodin, Danil, Gajane, Pratik, Pechenizkiy, Mykola, Kaptein, Maurits

arXiv.org Machine LearningNov-3-2021

We consider a special case of bandit problems, namely batched bandits. Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch learning. We provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy. Our main theoretical results show that the impact of batch learning can be measured in terms of online behavior. Finally, we demonstrate the consistency of theoretical results by conducting empirical experiments and reflect on the optimal batch size choice.

data mining, information technology services, machine learning, (19 more...)

arXiv.org Machine Learning

2111.02071

Country: Europe > Netherlands > North Brabant (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback

Autonomous exploration for navigating in non-stationary CMPs

Gajane, Pratik, Ortner, Ronald, Auer, Peter, Szepesvari, Csaba

arXiv.org Machine LearningOct-18-2019

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM, and prove an upper bound on the exploration steps in terms of the number of changes.

artificial intelligence, exploration step, machine learning, (18 more...)

arXiv.org Machine Learning

1910.08446

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Variational Regret Bounds for Reinforcement Learning

Gajane, Pratik, Ortner, Ronald, Auer, Peter

arXiv.org Machine LearningMay-23-2019

For this For reinforcement learning in MDP with changes in reward problem setting, we propose an algorithm and function and transition probabilities, we provide provide performance guarantees for the regret an algorithm, UCRL with Restarts, a version of UCRL evaluated against the optimal non-stationary [Jaksch et al., 2010], which restarts according to a schedule policy. The upper bound on the regret is given dependent on the variation in the MDP (defined in terms of the total variation in the MDP. in Section 2 below). We derive a high-probability upper This is the first variational regret bound for the bound on the cumulative regret of our algorithm of general reinforcement learning setting.

artificial intelligence, probability, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1905.05857

Country:

North America > United States (0.20)
Europe > Austria (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)

Add feedback