AITopics | Varakantham, Pradeep

Collaborating Authors

Varakantham, Pradeep

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Generalization Across Environments In Multi-Objective Reinforcement Learning

Teoh, Jayden, Varakantham, Pradeep, Vamplew, Peter

arXiv.org Artificial IntelligenceMar-12-2025

Real-world sequential decision-making tasks often require balancing trade-offs between multiple conflicting objectives, making Multi-Objective Reinforcement Learning (MORL) an increasingly prominent field of research. Despite recent advances, existing MORL literature has narrowly focused on performance within static environments, neglecting the importance of generalizing across diverse settings. Conversely, existing research on generalization in RL has always assumed scalar rewards, overlooking the inherent multi-objectivity of real-world problems. Generalization in the multi-objective context is fundamentally more challenging, as it requires learning a Pareto set of policies addressing varying preferences across multiple objectives. In this paper, we formalize the concept of generalization in MORL and how it can be evaluated. We then contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations to facilitate future studies in this area. Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement. Our empirical findings also expose limitations in the expressivity of scalar rewards, emphasizing the need for multi-objective specifications to achieve effective generalization. We further analyzed the algorithmic complexities within current MORL approaches that could impede the transfer in performance from the single- to multiple-environment settings. This work fills a critical gap and lays the groundwork for future research that brings together two key areas in reinforcement learning: solving multi-objective decision-making problems and generalizing across diverse environments. We make our code available at https://github.com/JaydenTeoh/MORL-Generalization.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.00799

Country:

Europe (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimizing Ride-Pooling Operations with Extended Pickup and Drop-Off Flexibility

Jiang, Hao, Xu, Yixing, Varakantham, Pradeep

arXiv.org Artificial IntelligenceMar-11-2025

The Ride-Pool Matching Problem (RMP) is central to on-demand ride-pooling services, where vehicles must be matched with multiple requests while adhering to service constraints such as pickup delays, detour limits, and vehicle capacity. Most existing RMP solutions assume passengers are picked up and dropped off at their original locations, neglecting the potential for passengers to walk to nearby spots to meet vehicles. This assumption restricts the optimization potential in ride-pooling operations. In this paper, we propose a novel matching method that incorporates extended pickup and drop-off areas for passengers. We first design a tree-based approach to efficiently generate feasible matches between passengers and vehicles. Next, we optimize vehicle routes to cover all designated pickup and drop-off locations while minimizing total travel distance. Finally, we employ dynamic assignment strategies to achieve optimal matching outcomes. Experiments on city-scale taxi datasets demonstrate that our method improves the number of served requests by up to 13\% and average travel distance by up to 21\% compared to leading existing solutions, underscoring the potential of leveraging passenger mobility to significantly enhance ride-pooling service efficiency.

artificial intelligence, machine learning, vehicle, (19 more...)

arXiv.org Artificial Intelligence

2503.08472

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Teoh, Jayden, Li, Wenjun, Varakantham, Pradeep

arXiv.org Machine LearningFeb-8-2025

Unsupervised Environment Design (UED) formalizes the problem of autocurricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to guide curriculum design. Regret-driven methods generate curricula that progressively increase environment complexity for the student but overlook environment novelty -- a critical element for enhancing an agent's generalizability. Measuring environment novelty is especially challenging due to the underspecified nature of environment parameters in UED, and existing approaches face significant limitations. To address this, this paper introduces the Coverage-based Evaluation of Novelty In Environment (CENIE) framework. CENIE proposes a scalable, domain-agnostic, and curriculum-aware approach to quantifying environment novelty by leveraging the student's state-action space coverage from previous curriculum experiences. We then propose an implementation of CENIE that models this coverage and measures environment novelty using Gaussian Mixture Models. By integrating both regret and novelty as complementary objectives for curriculum design, CENIE facilitates effective exploration across the state-action space while progressively increasing curriculum complexity. Empirical evaluations demonstrate that augmenting existing regret-based UED algorithms with CENIE achieves state-of-the-art performance across multiple benchmarks, underscoring the effectiveness of novelty-driven autocurricula for robust generalization.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2502.05726

Country:

Europe (1.00)
Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Education > Curriculum (0.68)
Leisure & Entertainment > Sports > Motorsports (0.46)
Education > Educational Technology > Educational Software (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Learning Informative Trajectory Embeddings for Imitation, Classification and Regression

Ge, Zichang, Chen, Changyu, Sinha, Arunesh, Varakantham, Pradeep

arXiv.org Artificial IntelligenceJan-16-2025

In real-world sequential decision making tasks like autonomous driving, robotics, and healthcare, learning from observed state-action trajectories is critical for tasks like imitation, classification, and clustering. For example, self-driving cars must replicate human driving behaviors, while robots and healthcare systems benefit from modeling decision sequences, whether or not they come from expert data. Existing trajectory encoding methods often focus on specific tasks or rely on reward signals, limiting their ability to generalize across domains and tasks. Inspired by the success of embedding models like CLIP and BERT in static domains, we propose a novel method for embedding state-action trajectories into a latent space that captures the skills and competencies in the dynamic underlying decision-making processes. This method operates without the need for reward labels, enabling better generalization across diverse domains and tasks. Our contributions are threefold: (1) We introduce a trajectory embedding approach that captures multiple abilities from state-action data. (2) The learned embeddings exhibit strong representational power across downstream tasks, including imitation, classification, clustering, and regression. (3) The embeddings demonstrate unique properties, such as controlling agent behaviors in IQ-Learn and an additive structure in the latent space. Experimental results confirm that our method outperforms traditional approaches, offering more flexible and powerful trajectory representations for various applications. Our code is available at https://github.com/Erasmo1015/vte.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2501.09327

Country: North America > United States (0.68)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.74)
Transportation > Ground > Road (0.68)
Automobiles & Trucks (0.68)
Information Technology > Robotics & Automation (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Offline Safe Reinforcement Learning Using Trajectory Classification

Gong, Ze, Kumar, Akshat, Varakantham, Pradeep

arXiv.org Artificial IntelligenceDec-19-2024

Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints at each time step (derived from global cost constraints) and this can result in either overly conservative policies or violation of safety constraints. In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories. To be specific, we first partition the pre-collected dataset of state-action trajectories into desirable and undesirable subsets. Intuitively, the desirable set contains high reward and safe trajectories, and undesirable set contains unsafe trajectories and low-reward safe trajectories. Second, we learn a policy that generates desirable trajectories and avoids undesirable trajectories, where (un)desirability scores are provided by a classifier learnt from the dataset of desirable and undesirable trajectories. This approach bypasses the computational complexity and stability issues of a min-max objective that is employed in existing methods. Theoretically, we also show our approach's strong connections to existing learning paradigms involving human feedback. Finally, we extensively evaluate our method using the DSRL benchmark for offline safe RL. Empirically, our method outperforms competitive baselines, achieving higher rewards and better constraint satisfaction across a wide variety of benchmark tasks.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2412.15429

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

Lu, Yuxiao, Sinha, Arunesh, Varakantham, Pradeep

arXiv.org Artificial IntelligenceDec-11-2024

Large Language Models (LLMs) generating unsafe responses to toxic prompts is a significant issue in their applications. While various efforts aim to address this safety concern, previous approaches often demand substantial human data collection or rely on the less dependable option of using another LLM to generate corrective data. In this paper, we aim to take this problem and overcome limitations of requiring significant high-quality human data. Our method requires only a small set of unsafe responses to toxic prompts, easily obtained from the unsafe LLM itself. By employing a semantic cost combined with a negative Earth Mover Distance (EMD) loss, we guide the LLM away from generating unsafe responses. Additionally, we propose a novel lower bound for EMD loss, enabling more efficient optimization. Our results demonstrate superior performance and data efficiency compared to baselines, and we further examine the nuanced effects of over-alignment and potential degradation of language capabilities when using contrastive data.

large language model, machine learning, toxic prompt, (16 more...)

arXiv.org Artificial Intelligence

2412.06843

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IRL for Restless Multi-Armed Bandits with Applications in Maternal and Child Health

Jain, Gauri, Varakantham, Pradeep, Xu, Haifeng, Taneja, Aparna, Doshi, Prashant, Tambe, Milind

arXiv.org Artificial IntelligenceDec-11-2024

Public health practitioners often have the goal of monitoring patients and maximizing patients' time spent in "favorable" or healthy states while being constrained to using limited resources. Restless multi-armed bandits (RMAB) are an effective model to solve this problem as they are helpful to allocate limited resources among many agents under resource constraints, where patients behave differently depending on whether they are intervened on or not. However, RMABs assume the reward function is known. This is unrealistic in many public health settings because patients face unique challenges and it is impossible for a human to know who is most deserving of any intervention at such a large scale. To address this shortcoming, this paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs, and we demonstrate improved outcomes in a maternal and child health telehealth program. First we allow public health experts to specify their goals at an aggregate or population level and propose an algorithm to design expert trajectories at scale based on those goals. Second, our algorithm WHIRL uses gradient updates to optimize the objective, allowing for efficient and accurate learning of RMAB rewards. Third, we compare with existing baselines and outperform those in terms of run-time and accuracy. Finally, we evaluate and show the usefulness of WHIRL on thousands on beneficiaries from a real-world maternal and child health setting in India. We publicly release our code here: https://github.com/Gjain234/WHIRL.

data mining, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-96-0128-8_15

2412.08463

Country:

Asia > India (0.34)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Public Health (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)
(2 more...)

Add feedback

UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations

Hoang, Huy, Mai, Tien, Varakantham, Pradeep

arXiv.org Artificial IntelligenceOct-10-2024

We address the problem of offline learning a policy that avoids undesirable demonstrations. Unlike conventional offline imitation learning approaches that aim to imitate expert or near-optimal demonstrations, our setting involves avoiding undesirable behavior (specified using undesirable demonstrations). To tackle this problem, unlike standard imitation learning where the aim is to minimize the distance between learning policy and expert demonstrations, we formulate the learning task as maximizing a statistical distance, in the space of state-action stationary distributions, between the learning policy and the undesirable policy. This significantly different approach results in a novel training objective that necessitates a new algorithm to address it. Our algorithm, UNIQ, tackles these challenges by building on the inverse Q-learning framework, framing the learning problem as a cooperative (non-adversarial) task. We then demonstrate how to efficiently leverage unlabeled data for practical training. Our method is evaluated on standard benchmark environments, where it consistently outperforms state-of-the-art baselines. The code implementation can be accessed at: https://github.com/hmhuy0/UNIQ. Reinforcement learning (RL) is a powerful framework for learning to maximize expected returns and has achieved remarkable success across various domains. However, applying reinforcement learning to real-world problems is challenging due to difficulties in designing reward functions and the requirement for extensive online interactions with the environment. While some approaches have addressed these challenges, they often rely on costly datasets, requiring either accurate labeling or clean, consistent data, which is often impractical. Imitation learning (Abbeel & Ng, 2004; Ziebart et al., 2008; Kelly et al., 2019) offers a more feasible alternative, enabling agents to learn directly from expert demonstrations without the need for explicit reward signals.

demonstration, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2410.08307

Genre: Research Report > New Finding (0.67)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Preserving the Privacy of Reward Functions in MDPs through Deception

Chirra, Shashank Reddy, Varakantham, Pradeep, Paruchuri, Praveen

arXiv.org Artificial IntelligenceJul-13-2024

Preserving the privacy of preferences (or rewards) of a sequential decision-making agent when decisions are observable is crucial in many physical and cybersecurity domains. For instance, in wildlife monitoring, agents must allocate patrolling resources without revealing animal locations to poachers. This paper addresses privacy preservation in planning over a sequence of actions in MDPs, where the reward function represents the preference structure to be protected. Observers can use Inverse RL (IRL) to learn these preferences, making this a challenging task. Current research on differential privacy in reward functions fails to ensure guarantee on the minimum expected reward and offers theoretical guarantees that are inadequate against IRL-based observers. To bridge this gap, we propose a novel approach rooted in the theory of deception. Deception includes two models: dissimulation (hiding the truth) and simulation (showing the wrong). Our first contribution theoretically demonstrates significant privacy leaks in existing dissimulation-based methods. Our second contribution is a novel RL-based planning algorithm that uses simulation to effectively address these privacy concerns while ensuring a guarantee on the expected reward. Experiments on multiple benchmark problems show that our approach outperforms previous methods in preserving reward function privacy.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2407.09809

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

EduQate: Generating Adaptive Curricula through RMABs in Education Settings

Tio, Sidney, Li, Dexun, Varakantham, Pradeep

arXiv.org Artificial IntelligenceJun-20-2024

There has been significant interest in the development of personalized and adaptive educational tools that cater to a student's individual learning progress. A crucial aspect in developing such tools is in exploring how mastery can be achieved across a diverse yet related range of content in an efficient manner. While Reinforcement Learning and Multi-armed Bandits have shown promise in educational settings, existing works often assume the independence of learning content, neglecting the prevalent interdependencies between such content. In response, we introduce Education Network Restless Multi-armed Bandits (EdNetRMABs), utilizing a network to represent the relationships between interdependent arms. Subsequently, we propose EduQate, a method employing interdependency-aware Q-learning to make informed decisions on arm selection at each time step. We establish the optimality guarantee of EduQate and demonstrate its efficacy compared to baseline policies, using students modeled from both synthetic and real-world data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2406.14122

Country: North America > United States (0.14)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Industry:

Health & Medicine (1.00)
Education > Educational Technology (0.88)
Education > Educational Setting (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback