Tambe, Milind


Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization

arXiv.org Machine Learning

Creating impact in real-world settings requires artificial intelligence techniques to span the full pipeline from data, to predictive models, to decisions. These components are typically approached separately: a machine learning model is first trained via a measure of predictive accuracy, and then its predictions are used as input into an optimization algorithm which produces a decision. However, the loss function used to train the model may easily be misaligned with the end goal, which is to make the best decisions possible. Hand-tuning the loss function to align with optimization is a difficult and error-prone process (which is often skipped entirely). We focus on combinatorial optimization problems and introduce a general framework for decision-focused learning, where the machine learning model is directly trained in conjunction with the optimization algorithm to produce high-quality decisions. Technically, our contribution is a means of integrating discrete optimization problems into deep learning or other predictive models, which are typically trained via gradient descent. The main idea is to use a continuous relaxation of the discrete problem to propagate gradients through the optimization procedure. We instantiate this framework for two broad classes of combinatorial problems: linear programs and submodular maximization. Experimental results across a variety of domains show that decision-focused learning often leads to improved optimization performance compared to traditional methods. We find that standard measures of accuracy are not a reliable proxy for a predictive model's utility in optimization, and our method's ability to specify the true goal as the model's training objective yields substantial dividends across a range of decision problems.


Decentralized dynamic task allocation for UAVs with limited communication range

arXiv.org Artificial Intelligence

We present the Limited-range Online Routing Problem (LORP), which involves a team of Unmanned Aerial Vehicles (UAVs) with limited communication range that must autonomously coordinate to service task requests. We first show a general approach to cast this dynamic problem as a sequence of decentralized task allocation problems. Then we present two solutions both based on modeling the allocation task as a Markov Random Field to subsequently assess decisions by means of the decentralized Max-Sum algorithm. Our first solution assumes independence between requests, whereas our second solution also considers the UAVs' workloads. A thorough empirical evaluation shows that our workloadbased solution consistently outperforms current state-of-the-art methods in a wide range of scenarios, lowering the average service time up to 16%. In the bestcase scenario there is no gap between our decentralized solution and centralized techniques. In the worst-case scenario we manage to reduce by 25% the gap between current decentralized and centralized techniques. Thus, our solution becomes the method of choice for our problem. Keywords: task allocation, unmanned aerial vehicles, max-sum, decentralized 1. Introduction Unmanned Aerial Vehicles (UAVs) are an attractive technology for largearea surveillance [1]. Today, there are readily available UAVs that are reasonably cheap, have many sensing abilities, exhibit a long endurance and can communicate using radios. UAVs have traditionally been controlled either remotely or by following externally-designed flight plans. Requiring human operators for each UAV implies a large, specialized and expensive human workforce. Likewise, letting UAVs follow externally prepared plans introduces a single point of failure (the planner) and requires UAVs with expensive (satellite) radios to maintain continuous communication with a central station. These constraints are acceptable in some application domains, other applications require more flexible techniques. For instance, consider a force of park rangers tasked with the surveillance of a large natural park. Upon reception of an emergency notification, the rangers must assess the situation as quickly as possible.


Evaluation of Predictive Models for Wildlife Poaching Activity through Controlled Field Test in Uganda

AAAI Conferences

Worldwide, conservation agencies employ rangers to protect conservation areas from poachers. However, agencies lack the manpower to have rangers effectively patrol these vast areas frequently. While past work modeled poachers’ behavior so as to aid rangers in planning future patrols, those models’ predictions were not validated by extensive field tests. We conducted two rounds of field tests in Uganda’s Queen Elizabeth Protected Area to evaluate our proposed spatio-temporal model that predicts poaching threat levels. In the first round, a one-month field test was conducted to test the predictive power of the model and in the second round an eight-month test was conducted to evaluate the selectiveness power of the model. To our knowledge, this is the first time that a predictive model is evaluated through such an extensive field test in this domain. These field tests will be extended to another park in Uganda, Murchison Fall Protected Area. Once such models are evaluated in the field, they can be used to generate efficient and feasible patrol routes for the park rangers.


SPOT Poachers in Action: Augmenting Conservation Drones With Automatic Detection in Near Real Time

AAAI Conferences

The unrelenting threat of poaching has led to increased development of new technologies to combat it. One such example is the use of long wave thermal infrared cameras mounted on unmanned aerial vehicles (UAVs or drones) to spot poachers at night and report them to park rangers before they are able to harm animals. However, monitoring the live video stream from these conservation UAVs all night is an arduous task. Therefore, we build SPOT (Systematic POacher deTector), a novel application that augments conservation drones with the ability to automatically detect poachers and animals in near real time. SPOT illustrates the feasibility of building upon state-of-the-art AI techniques, such as Faster RCNN, to address the challenges of automatically detecting animals and poachers in infrared images. This paper reports (i) the design and architecture of SPOT, (ii) a series of efforts towards more robust and faster processing to make SPOT usable in the field and provide detections in near real time, and (iii) evaluation of SPOT based on both historical videos and a real-world test run by the end users in the field. The promising results from the test in the field have led to a plan for larger-scale deployment in a national park in Botswana. While SPOT is developed for conservation drones, its design and novel techniques have wider application for automated detection from UAV videos.


Influence Maximization for Social Network Based Substance Abuse Prevention

AAAI Conferences

Substance use and abuse is a significant public health problem in the United States. Group-based intervention programs offer a promising means of reducing substance abuse. While effective, inappropriate intervention groups can result in an increase in deviant behaviors among participants, a process known as deviancy training. In this paper, we present GUIDE, an AI-based decision aid that leverages social network information to optimize the structure of the intervention groups.


Strategic Coordination of Human Patrollers and Mobile Sensors With Signaling for Security Games

AAAI Conferences

Traditional security games concern the optimal randomized allocation of human patrollers, who can directly catch attackers or interdict attacks. Motivated by the emerging application of utilizing mobile sensors (e.g., UAVs) for patrolling, in this paper we propose the novel Sensor-Empowered security Game (SEG) model which captures the joint allocation of human patrollers and mobile sensors. Sensors differ from patrollers in that they cannot directly interdict attacks, but they can notify nearby patrollers (if any). Moreover, SEGs incorporate mobile sensors' natural functionality of strategic signaling. On the technical side, we first prove that solving SEGs is NP-hard even in zero-sum cases. We then develop a scalable algorithm SEGer based on the branch-and-price framework with two key novelties: (1) a novel MILP formulation for the slave; (2) an efficient relaxation of the problem for pruning. To further accelerate SEGer, we design a faster combinatorial algorithm for the slave problem, which is provably a constant-approximation to the slave problem in zero-sum cases and serves as a useful heuristic for general-sum SEGs. Our experiments demonstrate the significant benefit of utilizing mobile sensors.


Preventing Infectious Disease in Dynamic Populations Under Uncertainty

AAAI Conferences

Treatable infectious diseases are a critical challenge for public health. Outreach campaigns can encourage undiagnosed patients to seek treatment but must be carefully targeted to make the most efficient use of limited resources. We present an algorithm to optimally allocate limited outreach resources among demographic groups in the population. The algorithm uses a novel multiagent model of disease spread which both captures the underlying population dynamics and is amenable to optimization. Our algorithm extends, with provable guarantees, to a stochastic setting where we have only a distribution over parameters such as the contact pattern between agents. We evaluate our algorithm on two instances where this distribution is inferred from real world data: tuberculosis in India and gonorrhea in the United States. Our algorithm produces a policy which is predicted to avert an average of least 8,000 person-years of tuberculosis and 20,000 person-years of gonorrhea annually compared to current policy.


Policy Learning for Continuous Space Security Games Using Neural Networks

AAAI Conferences

A wealth of algorithms centered around (integer) linear programming have been proposed to compute equilibrium strategies in security games with discrete states and actions. However, in practice many domains possess continuous state and action spaces. In this paper, we consider a continuous space security game model with infinite-size action sets for players and present a novel deep learning based approach to extend the existing toolkit for solving security games. Specifically, we present (i) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parameterized continuous search space, and can also be used to learn policies over multiple game states simultaneously; (ii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games. We demonstrate the potential to predict good defender strategies via experiments and analysis of OptGradFP and OptGradFP-NN on discrete and continuous game settings.


Maximizing Influence in an Unknown Social Network

AAAI Conferences

In many real world applications of influence maximization, practitioners intervene in a population whose social structure is initially unknown. This poses a multiagent systems challenge to act under uncertainty about how the agents are connected. We formalize this problem by introducing exploratory influence maximization, in which an algorithm queries individual network nodes (agents) to learn their links. The goal is to locate a seed set nearly as influential as the global optimum using very few queries. We show that this problem is intractable for general graphs. However, real world networks typically have community structure, where nodes are arranged in densely connected subgroups. We present the ARISEN algorithm, which leverages community structure to find an influential seed set. Experiments on real world networks of homeless youth, village populations in India, and others demonstrate ARISEN's strong empirical performance. To formally demonstrate how ARISEN exploits community structure, we prove an approximation guarantee for ARISEN on graphs drawn from the Stochastic Block Model.


Mitigating the Curse of Correlation in Security Games by Entropy Maximization

arXiv.org Artificial Intelligence

In Stackelberg security games, a defender seeks to randomly allocate limited security resources to protect critical targets from an attack. In this paper, we study a fundamental, yet underexplored, phenomenon in security games, which we term the \emph{Curse of Correlation} (CoC). Specifically, we observe that there are inevitable correlations among the protection status of different targets. Such correlation is a crucial concern, especially in \emph{spatio-temporal} domains like conservation area patrolling, where attackers can surveil patrollers at certain areas and then infer their patrolling routes using such correlations. To mitigate this issue, we propose to design entropy-maximizing defending strategies for spatio-temporal security games, which frequently suffer from CoC. We prove that the problem is \#P-hard in general. However, it admits efficient algorithms in well-motivated special settings. Our experiments show significant advantages of max-entropy algorithms over previous algorithms. A scalable implementation of our algorithm is currently under pre-deployment testing for integration into FAMS software to improve the scheduling of US federal air marshals.