Zilberstein, Shlomo


Privacy-Preserving Policy Iteration for Decentralized POMDPs

AAAI Conferences

We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP benchmark problems confirm the effectiveness of our approach.


Does the Human's Representation Matter for Unsupervised Activity Recognition?

AAAI Conferences

Unsupervised activity recognition introduces the opportunity for more robust interaction experiences with machines because the human is not limited to only acting with respect to a training dataset. Many approaches currently use latent variable models that have been well studied and developed by the natural language research communities. However, these models are simply used as-is or with minor tweaks on datasets that present an analogy between sensor reading sequences and text documents. Although words have well-defined semantics so that the learned clusters can be interpreted and verified, this is not often the case for sensor readings. For example, novel data from new human activities need to be classified, which relies on the learned clusters; so how does one confirm that new activities are being correctly processed by a robot for interaction? We present several ways that motion capture information can be represented for use in these methods, and then illustrate how the representation choice has the potential to produce variations in the learned clusters.


Robust Optimization for Tree-Structured Stochastic Network Design

AAAI Conferences

Stochastic network design is a general framework for optimizing network connectivity. It has several applications in computational sustainability including spatial conservation planning, pre-disaster network preparation, and river network optimization. A common assumption in previous work has been made that network parameters (e.g., probability of species colonization) are precisely known, which is unrealistic in real- world settings. We therefore address the robust river network design problem where the goal is to optimize river connectivity for fish movement by removing barriers. We assume that fish passability probabilities are known only imprecisely, but are within some interval bounds. We then develop a planning approach that computes the policies with either high robust ratio or low regret. Empirically, our approach scales well to large river networks. We also provide insights into the solutions generated by our robust approach, which has significantly higher robust ratio than the baseline solution with mean parameter estimates.


Fast SSP Solvers Using Short-Sighted Labeling

AAAI Conferences

State-of-the-art methods for solving SSPs often work by limiting planning to restricted regions of the state space. The resulting problems can then be solved quickly, and the process is repeated during execution when states outside the restricted region are encountered. Typically, these approaches focus on states that are within some distance measure of the start state (e.g., number of actions or probability of being reached). However, these short-sighted approaches make it difficult to propagate information from states that are closer to a goal than to the start state, thus missing opportunities to improve planning. We present an alternative approach in which short-sightedness is used only to determine whether a state should be labeled as solved or not, but otherwise the set of states that can be accounted for during planning is unrestricted. Based on this idea, we propose the FLARES algorithm and show that it performs consistently well on a wide range of benchmark problems.


Redesigning Stochastic Environments for Maximized Utility

AAAI Conferences

​We present the Utility Maximizing Design (UMD) model​ for optimally redesigning stochastic environments to achieve maximized performance. This model suits well contemporary ​​applications that involve the design of environments where robots and humans co-exist an co-operate, e.g., vacuum cleaning robot. We discuss two special cases of the UMD model. The first is the equi-reward UMD (ER-UMD)​ ​in which the agents and the system share a utility function, such as for the vacuum cleaning robot. The second is the goal​ ​recognition design (GRD) setting, discussed in the literature, in which system and agent utilities are independent. To find the set of optimal​​ modifications to apply to a UMD model, we propose the use of heuristic search, extending previous methods used for GRD settings. After specifying the conditions for optimality in the​ general case, we present an admissible heuristic for the ER-UMD case. We also present a novel compilation that embeds​ the redesign process into a planning problem, allowing use of any off-the-shelf solver to find the best way to modify an environment when a design budget is specified. Our evaluation shows the feasibility of the approach using standard bench​​marks from the probabilistic planning competition.​


Integration of Planning with Recognition for Responsive Interaction Using Classical Planners

AAAI Conferences

Interaction between multiple agents requires some form of coordination and a level of mutual awareness. When computers and robots interact with people, they need to recognize human plans and react appropriately. Plan and goal recognition techniques have focused on identifying an agent's task given a sufficiently long action sequence. However, by the time the plan and/or goal are recognized, it may be too late for computing an interactive response. We propose an integration of planning with probabilistic recognition where each method uses intermediate results from the other as a guiding heuristic for recognition of the plan/goal in-progress as well as the interactive response. We show that, like the used recognition method, these interaction problems can be compiled into classical planning problems and solved using off-the-shelf methods. In addition to the methodology, this paper introduces problem categories for different forms of interaction, an evaluation metric for the benefits from the interaction, and extensions to the recognition algorithm that make its intermediate results more practical while the plan is in progress.


Redesigning Stochastic Environments for Maximized Utility

AAAI Conferences

We present the Utility Maximizing Design (UMD) model for optimally redesigning stochastic environments to achieve maximized performance. This model suits well contemporary applications that involve the design of environments where robots and humans co-exist an co-operate, e.g., vacuum cleaning robot. We discuss two special cases of the UMD model. The first is the equi-reward UMD (ER-UMD) in which the agents and the system share a utility function, such as for the vacuum cleaning robot. The second is the goal recognition design (GRD) setting, discussed in the literature, in which system and agent utilities are independent. To find the set of optimal modifications to apply to a UMD model, we present a generic method, based on heuristic search. After specifying the conditions for optimality in the general case, we present an admissible heuristic for the ER-UMD case. We also present a novel compilation that embeds the redesign process into a planning problem, allowing use of any off-the-shelf solver to find the best way to modify an environment when a design budget is specified. Our evaluation shows the feasibility of the approach using standard benchmarks from the probabilistic planning competition.


Safety in AI-HRI: Challenges Complementing User Experience Quality

AAAI Conferences

Contemporary research in human-robot interaction (HRI) predominantly focuses on the user's experience while controlling a robot. However, with the increased deployment of artificial intelligence (AI) techniques, robots are quickly becoming more autonomous in both academic and industrial experimental settings. In addition to improving the user's interactive experience with AI-operated robots through personalization, dialogue, emotions, and dynamic behavior, there is also a growing need to consider the safety of the interaction. AI may not account for the user's less likely responses, making it possible for an unaware user to be injured by the robot if they have a collision. Issues of trust and acceptance may also come into play if users cannot always understand the robot's thought process, creating a potential for emotional harm. We identify challenges that will need to be addressed in safe AI-HRI and provide an overview of approaches to consider for them, many stemming from the contemporary research.


Dual Formulations for Optimizing Dec-POMDP Controllers

AAAI Conferences

Decentralized POMDP is an expressive model for multi-agent planning. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions.


A POMDP Formulation of Proactive Learning

AAAI Conferences

We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible, cost-varying oracles—as a Partially Observable Markov Decision Process (POMDP). The agent selects an oracle at each time step to label a data point, while it maintains a belief over the true underlying correctness of its current dataset’s labels. The goal is to minimize labeling costs while considering the value of obtaining correct labels, thus maximizing final resultant classifier accuracy. We prove three properties that show our particular formulation leads to a structured and bounded-size set of belief points, enabling strong performance of point-based methods to solve the POMDP. Our method is compared with the original three algorithms proposed by Donmez and Carbonell and a simple baseline. We demonstrate that our approach matches or improves upon the original approach within five different oracle scenarios, each on two datasets. Finally, our algorithm provides a general, well-defined mathematical foundation to build upon.