Goto

Collaborating Authors

 Agents


Solving Factored MDPs with Continuous and Discrete Variables

arXiv.org Artificial Intelligence

Although many real-world stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current state-of-the-art methods cannot adequately address these problems. We present the first framework that can exploit problem structure for modeling and solving hybrid problems efficiently. We formulate these problems as hybrid Markov decision processes (MDPs with continuous and discrete state and action variables), which we assume can be represented in a factored way using a hybrid dynamic Bayesian network (hybrid DBN). This formulation also allows us to apply our methods to collaborative multiagent settings. We present a new linear program approximation method that exploits the structure of the hybrid MDP and lets us compute approximate value functions more efficiently. In particular, we describe a new factored discretization of continuous variables that avoids the exponential blow-up of traditional approaches. We provide theoretical bounds on the quality of such an approximation and on its scale-up potential. We support our theoretical arguments with experiments on a set of control problems with up to 28-dimensional continuous state space and 22-dimensional action space.


Competitive Benchmarking: Lessons Learned from the Trading Agent Competition

AI Magazine

Over the years, competitions have been important catalysts for progress in artificial intelligence. We describe the goal of the overall Trading Agent Competition and highlight particular competitions. We discuss its significance in the context of today's global market economy as well as AI research, the ways in which it breaks away from limiting assumptions made in prior work, and some of the advances it has engendered over the past ten years. Since its introduction in 2000, TAC has attracted more than 350 entries and brought together researchers from AI and beyond.


Competitive Benchmarking: Lessons Learned from the Trading Agent Competition

AI Magazine

In many real-life domains, such as trading environments, selfinterested entities need to operate subject to limited time and information. Additionally, the web has mediated an ever broader range of transactions, urging participants to concurrently trade across multiple markets. All these have generated the need for technologies that empower prompt investigation of large volumes of data and rapid evaluation of numerous alternative strategies in the face of constantly changing market conditions (Bichler, Gupta, and Ketter 2010). AI and machine-learning techniques, including neural networks and genetic algorithms, are continuously gaining ground in the support of such trading scenarios. User modeling, price forecasting, market equilibrium prediction, and strategy optimization are typical cases where AI typically provides reliable solutions. Yet, the adoption and deployment of AI practices in real trading environments remains limited, since the proprietary nature of markets precludes open benchmarking, which is critical for further scientific progress.


Learning by Demonstration for a Collaborative Planning Environment

AI Magazine

Learning by demonstration technology has long held the promise to empower non-programmers to customize and extend software. We describe the deployment of a learning by demonstration capability to support user creation of automated procedures in a collaborative planning environment that is used widely by the U.S. Army. This technology, which has been in operational use since the summer of 2010, has helped to reduce user workloads by automating repetitive and time-consuming tasks. The technology has also provided the unexpected benefit of enabling standardization of products and processes.


Optimal Coordinated Planning Amongst Self-Interested Agents with Private State

arXiv.org Artificial Intelligence

Consider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to each agent and causes local state transitions. A social planner knows the model of every agent's MDP and wants to implement the optimal joint policy, but agents are self-interested and have private local state. We provide an incentive-compatible mechanism for eliciting state information that achieves the optimal joint plan in a Markov perfect equilibrium of the induced stochastic game. In the special case in which local problems are Markov chains and agents compete to take a single action in each period, we leverage Gittins allocation indices to provide an efficient factored algorithm and distribute computation of the optimal policy among the agents. Distributed, optimal coordinated learning in a multi-agent variant of the multi-armed bandit problem is obtained as a special case.


Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs

arXiv.org Artificial Intelligence

Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well.


Optimizing Memory-Bounded Controllers for Decentralized POMDPs

arXiv.org Artificial Intelligence

We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty.


Identifying reasoning patterns in games

arXiv.org Artificial Intelligence

We present an algorithm that identifies the reasoning patterns of agents in a game, by iteratively examining the graph structure of its Multi-Agent Influence Diagram (MAID) representation. If the decision of an agent participates in no reasoning patterns, then we can effectively ignore that decision for the purpose of calculating a Nash equilibrium for the game. In some cases, this can lead to exponential time savings in the process of equilibrium calculation. Moreover, our algorithm can be used to enumerate the reasoning patterns in a game, which can be useful for constructing more effective computerized agents interacting with humans.


Schedule-Driven Coordination for Real-Time Traffic Network Control

AAAI Conferences

Real-time optimization of the dynamic flow of vehicle traffic through a network of signalized intersections is an important practical problem. In this paper, we take a decentralized, schedule-driven coordination approach to address the challenge of achieving scalable network-wide optimization. To be locally effective, each intersection is controlled independently by an on-line scheduling agent. At each decision point, an agent constructs a schedule that optimizes movement of the observable traffic through the intersection, and uses this schedule to determine the best control action to take over the current look-ahead horizon. Decentralized coordination mechanisms, limited to interaction among direct neighbors to ensure scalability, are then layered on top of these asynchronously operating scheduling agents to promote overall performance. As a basic protocol, each agent queries for newly planned output flows from its upstream neighbors to obtain an optimistic projection of future demand. This projection may incorporate non-local influence from indirect neighbors depending on horizon length. Two additional mechanisms are then introduced to dampen ``nervousness'' and dynamic instability in the network, by adjusting locally determined schedules to better align with those of neighbors. We present simulation results on two traffic networks of tightly-coupled intersections that demonstrate the ability of our approach to establish traffic flows with lower average vehicle wait times than both a simple isolated control strategy and other contemporary coordinated control strategies that use moving average forecast or traditional offset calculation.


Plan-Based Policy-Learning for Autonomous Feature Tracking

AAAI Conferences

Mapping and tracking biological ocean features, such as harmful algal blooms, is an important problem in the environmental sciences. The problem exhibits a high degree of uncertainty, because of both the dynamic ocean context and the challenges of sensing. Plan-based policy learning has been shown to be a powerful technique for obtaining robust intelligent behaviour in the face of uncertainty. In this paper we apply this technique in simulation, to the problem of tracking the outer edge of 2D biological features, such as the surfaces of harmful algal blooms. We show that plan-based policy-learning leads to highly accurate tracking in simulation, even in situations where the uncertainty governing the shape of the patch cannot be directly modelled. We present simulation results that give confidence that the approach could work in practice. We are now collaborating with ocean scientists at MBARI to perform physical tests at sea.