Goto

Collaborating Authors

 Cubuktepe, Murat


Reward-Based Deception with Cognitive Bias

arXiv.org Artificial Intelligence

Deception plays a key role in adversarial or strategic interactions for the purpose of self-defence and survival. This paper introduces a general framework and solution to address deception. Most existing approaches for deception consider obfuscating crucial information to rational adversaries with abundant memory and computation resources. In this paper, we consider deceiving adversaries with bounded rationality and in terms of expected rewards. This problem is commonly encountered in many applications especially involving human adversaries. Leveraging the cognitive bias of humans in reward evaluation under stochastic outcomes, we introduce a framework to optimally assign resources of a limited quantity to optimally defend against human adversaries. Modeling such cognitive biases follows the so-called prospect theory from behavioral psychology literature. Then we formulate the resource allocation problem as a signomial program to minimize the defender's cost in an environment modeled as a Markov decision process. We use police patrol hour assignment as an illustrative example and provide detailed simulation results based on real-world data.


The Partially Observable Games We Play for Cyber Deception

arXiv.org Artificial Intelligence

Progressively intricate cyber infiltration mechanisms have made conventional means of defense, such as firewalls and malware detectors, incompetent. These sophisticated infiltration mechanisms can study the defender's behavior, identify security caveats, and modify their actions adaptively. To tackle these security challenges, cyber-infrastructures require active defense techniques that incorporate cyber deception, in which the defender (deceiver) implements a strategy to mislead the infiltrator. To this end, we use a two-player partially observable stochastic game (POSG) framework, wherein the deceiver has full observability over the states of the POSG, and the infiltrator has partial observability. Then, the deception problem is to compute a strategy for the deceiver that minimizes the expected cost of deception against all strategies of the infiltrator. We first show that the underlying problem is a robust mixed-integer linear program, which is intractable to solve in general. Towards a scalable approach, we compute optimal finite-memory strategies for the infiltrator by a reduction to a series of synthesis problems for parametric Markov decision processes. We use these infiltration strategies to find robust strategies for the deceiver using mixed-integer linear programming. We illustrate the performance of our technique on a POSG model for network security. Our experiments demonstrate that the proposed approach handles scenarios considerably larger than those of the state-of-the-art methods.


Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

arXiv.org Artificial Intelligence

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the continual exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm to synthesize a policy that maximizes the entropy of an MDP. The proposed algorithm is based on a convex optimization problem and runs in time polynomial in the size of the MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate that as the restrictions imposed on the paths by a specification increase, the maximum entropy decreases, which in turn, increases the predictability of paths.


Synthesis in pMDPs: A Tale of 1001 Parameters

arXiv.org Artificial Intelligence

This paper considers parametric Markov decision processes (pMDPs) whose transitions are equipped with affine functions over a finite set of parameters. The synthesis problem is to find a parameter valuation such that the instantiated pMDP satisfies a specification under all strategies. We show that this problem can be formulated as a quadratically-constrained quadratic program (QCQP) and is non-convex in general. To deal with the NP-hardness of such problems, we exploit a convex-concave procedure (CCP) to iteratively obtain local optima. An appropriate interplay between CCP solvers and probabilistic model checkers creates a procedure --- realized in the open-source tool PROPhESY --- that solves the synthesis problem for models with thousands of parameters.