Varakantham, Pradeep
Facilitating human-wildlife cohabitation through conflict prediction
Ghosh, Susobhan, Varakantham, Pradeep, Bhatkhande, Aniket, Ahmad, Tamanna, Andheria, Anish, Li, Wenjun, Taneja, Aparna, Thakkar, Divy, Tambe, Milind
With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large-scale loss of lives (animal and human) and livelihoods (economic). While community knowledge is valuable, forest officials and conservation organisations can greatly benefit from predictive analysis of human-wildlife conflict, leading to targeted interventions that can potentially help save lives and livelihoods. However, the problem of prediction is a complex socio-technical problem in the context of limited data in low-resource regions. Identifying the "right" features to make accurate predictions of conflicts at the required spatial granularity using a sparse conflict training dataset} is the key challenge that we address in this paper. Specifically, we do an illustrative case study on human-wildlife conflicts in the Bramhapuri Forest Division in Chandrapur, Maharashtra, India. Most existing work has considered human-wildlife conflicts in protected areas and to the best of our knowledge, this is the first effort at prediction of human-wildlife conflicts in unprotected areas and using those predictions for deploying interventions on the ground.
Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health
Mate, Aditya, Madaan, Lovish, Taneja, Aparna, Madhiwalla, Neha, Verma, Shresth, Singh, Gargi, Hegde, Aparna, Varakantham, Pradeep, Tambe, Milind
The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ~ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.
CLAIM: Curriculum Learning Policy for Influence Maximization in Unknown Social Networks
Li, Dexun, Lowalekar, Meghna, Varakantham, Pradeep
Influence maximization is the problem of finding a small subset of nodes in a network that can maximize the diffusion of information. Recently, it has also found application in HIV prevention, substance abuse prevention, micro-finance adoption, etc., where the goal is to identify the set of peer leaders in a real-world physical social network who can disseminate information to a large group of people. Unlike online social networks, real-world networks are not completely known, and collecting information about the network is costly as it involves surveying multiple people. In this paper, we focus on this problem of network discovery for influence maximization. The existing work in this direction proposes a reinforcement learning framework. As the environment interactions in real-world settings are costly, so it is important for the reinforcement learning algorithms to have minimum possible environment interactions, i.e, to be sample efficient. In this work, we propose CLAIM - Curriculum LeArning Policy for Influence Maximization to improve the sample efficiency of RL methods. We conduct experiments on real-world datasets and show that our approach can outperform the current best approach.
Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare
Biswas, Arpita, Aggarwal, Gaurav, Varakantham, Pradeep, Tambe, Milind
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.
Competitive Ratios for Online Multi-capacity Ridesharing
Lowalekar, Meghna, Varakantham, Pradeep, Jaillet, Patrick
In multi-capacity ridesharing, multiple requests (e.g., customers, food items, parcels) with different origin and destination pairs travel in one resource. In recent years, online multi-capacity ridesharing services (i.e., where assignments are made online) like Uber-pool, foodpanda, and on-demand shuttles have become hugely popular in transportation, food delivery, logistics and other domains. This is because multi-capacity ridesharing services benefit all parties involved { the customers (due to lower costs), the drivers (due to higher revenues) and the matching platforms (due to higher revenues per vehicle/resource). Most importantly these services can also help reduce carbon emissions (due to fewer vehicles on roads). Online multi-capacity ridesharing is extremely challenging as the underlying matching graph is no longer bipartite (as in the unit-capacity case) but a tripartite graph with resources (e.g., taxis, cars), requests and request groups (combinations of requests that can travel together). The desired matching between resources and request groups is constrained by the edges between requests and request groups in this tripartite graph (i.e., a request can be part of at most one request group in the final assignment). While there have been myopic heuristic approaches employed for solving the online multi-capacity ridesharing problem, they do not provide any guarantees on the solution quality. To that end, this paper presents the first approach with bounds on the competitive ratio for online multi-capacity ridesharing (when resources rejoin the system at their initial location/depot after serving a group of requests).
Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing
Lowalekar, Meghna, Varakantham, Pradeep, Jaillet, Patrick
Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the "right" requests to travel together in the "right" available vehicles in real-time, so that the objective (e.g., requests served, revenue or delay) is optimized. This challenge has been addressed in existing work by: (i) generating as many relevant feasible (with respect to the available delay for customers) combinations of requests as possible in real-time; and then (ii) optimizing assignment of the feasible request combinations to vehicles. Since the number of request combinations increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such approaches have to employ ad hoc heuristics to identify a subset of request combinations for assignment. Our key contribution is in developing approaches that employ zone (abstraction of individual locations) paths instead of request combinations. Zone paths allow for generation of significantly more "relevant" combinations (in comparison to ad hoc heuristics) in real-time than competing approaches due to two reasons: (i) Each zone path can typically represent multiple request combinations; (ii) Zone paths are generated using a combination of offline and online methods. Specifically, we contribute both myopic (ridesharing assignment focussed on current requests only) and non-myopic (ridesharing assignment considers impact on expected future requests) approaches that employ zone paths. In our experimental results, we demonstrate that our myopic approach outperforms (with respect to both objective and runtime) the current best myopic approach for ridesharing on both real-world and synthetic datasets.
TuSeRACT: Turn-Sample-Based Real-Time Traffic Signal Control
Dhamija, Srishti, Varakantham, Pradeep
Real-time traffic signal control systems can effectively reduce urban traffic congestion but can also become significant contributors to congestion if poorly timed. Real-time traffic signal control is typically challenging owing to constantly changing traffic demand patterns, very limited planning time and various sources of uncertainty in the real world (due to vehicle detection or unobserved vehicle turn movements, for instance). SURTRAC (Scalable URban TRAffic Control) is a recently developed traffic signal control approach which computes delay-minimising and coordinated (across neighbouring traffic lights) schedules of oncoming vehicle clusters in real time. To ensure real-time responsiveness in the presence of turn-induced uncertainty, SURTRAC computes schedules which minimize the delay for the expected turn movements as opposed to minimizing the expected delay under turn-induced uncertainty. Furthermore, expected outgoing traffic clusters are communicated to downstream intersections. These approximations ensure real-time tractability, but degrade solution quality in the presence of turn-induced uncertainty. To address this limitation, we introduce TuSeRACT (Turn-Sample-based Real-time trAffic signal ConTrol), a distributed sample-based scheduling approach to traffic signal control. Unlike SURTRAC, TuSeRACT computes schedules that minimize expected delay over sampled turn movements of observed traffic, and communicates samples of traffic outflows to neighbouring intersections. We formulate this sample-based scheduling problem as a constraint program, and empirically evaluate our approach on synthetic traffic networks. We demonstrate that our approach results in substantially lower average vehicle waiting times as compared to SURTRAC when turn-induced uncertainty is present.
Resource Constrained Deep Reinforcement Learning
Bhatia, Abhinav, Varakantham, Pradeep, Kumar, Akshat
In urban environments, supply resources have to be constantly matched to the "right" locations (where customer demand is present) so as to improve quality of life. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in EMS (Emergency Management Systems); vehicles (cars, bikes, scooters etc.) have to be matched to docking stations so as to reduce lost demand in shared mobility systems. Such problem domains are challenging owing to the demand uncertainty, combinatorial action spaces (due to allocation) and constraints on allocation of resources (e.g., total resources, minimum and maximum number of resources at locations and regions). Existing systems typically employ myopic and greedy optimization approaches to optimize allocation of supply resources to locations. Such approaches typically are unable to handle surges or variances in demand patterns well. Recent research has demonstrated the ability of Deep RL methods in adapting well to highly uncertain environments. However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources. To that end, we have developed three approaches on top of the well known actor critic approach, DDPG (Deep Deterministic Policy Gradient) that are able to handle constraints on resource allocation. More importantly, we demonstrate that they are able to outperform leading approaches on simulators validated on semi-real and real data sets.
Entropy Controlled Non-Stationarity for Improving Performance of Independent Learners in Anonymous MARL Settings
Verma, Tanvi, Varakantham, Pradeep, Lau, Hoong Chuin
With the advent of sequential matching (of supply and demand) systems (uber, Lyft, Grab for taxis; ubereats, deliveroo, etc for food; amazon prime, lazada etc. for groceries) across many online and offline services, individuals (taxi drivers, delivery boys, delivery van drivers, etc.) earn more by being at the "right" place at the "right" time. We focus on learning techniques for providing guidance (on right locations to be at right times) to individuals in the presence of other "learning" individuals. Interactions between indivduals are anonymous, i.e, the outcome of an interaction (competing for demand) is independent of the identity of the agents and therefore we refer to these as Anonymous MARL settings. Existing research of relevance is on independent learning using Reinforcement Learning (RL) or on Multi-Agent Reinforcement Learning (MARL). The number of individuals in aggregation systems is extremely large and individuals have their own selfish interest (of maximising revenue). Therefore, traditional MARL approaches are either not scalable or assumptions of common objective or action coordination are not viable. In this paper, we focus on improving performance of independent reinforcement learners, specifically the popular Deep Q-Networks (DQN) and Advantage Actor Critic (A2C) approaches by exploiting anonymity. Specifically, we control non-stationarity introduced by other agents using entropy of agent density distribution. We demonstrate a significant improvement in revenue for individuals and for all agents together with our learners on a generic experimental set up for aggregation systems and a real world taxi dataset.
Upping the Game of Taxi Driving in the Age of Uber
Jha, Shashi Shekhar (Singapore Management University) | Cheng, Shih-Fen (Singapore Management University) | Lowalekar, Meghna (Singapore Management University) | Wong, Nicholas (Singapore Management University) | Rajendram, Rishikeshan (Singapore Management University) | Tran, Trong Khiem (Singapore Management University) | Varakantham, Pradeep (Singapore Management University) | Trong, Nghia Truong (Singapore Management University) | Rahman, Firmansyah Bin Abd (Singapore Management University)
In most cities, taxis play an important role in providing point-to-point transportation service. If the taxi service is reliable, responsive, and cost-effective, past studies show that taxi-like services can be a viable choice in replacing a significant amount of private cars. However, making taxi services efficient is extremely challenging, mainly due to the fact that taxi drivers are self-interested and they operate with only local information. Although past research has demonstrated how recommendation systems could potentially help taxi drivers in improving their performance, most of these efforts are not feasible in practice. This is mostly due to the lack of both the comprehensive data coverage and an efficient recommendation engine that can scale to tens of thousands of drivers. In this paper, we propose a comprehensive working platform called the Driver Guidance System (DGS). With real-time citywide taxi data provided by our collaborator in Singapore, we demonstrate how we can combine real-time data analytics and large-scale optimization to create a guidance system that can potentially benefit tens of thousands of taxi drivers. Via a realistic agent-based simulation, we demonstrate that drivers following DGS can significantly improve their performance over ordinary drivers, regardless of the adoption ratios. We have concluded our system designing and building and have recently entered the field trial phase.