AITopics

doi: 10.1613/jair.2497

AI Access Foundation

10552

Country:

North America > United States > Rhode Island > Providence County > Providence (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Goldman, C. V., Zilberstein, S.

Communication-Based Decomposition Mechanisms for Decentralized MDPs

Journal of Artificial Intelligence ResearchMay-23-2008

Multi-agent planning in stochastic environments can be framed formally as a decentralized Markov decision problem. Many real-life distributed problems that arise in manufacturing, multi-robot coordination and information gathering scenarios can be formalized using this framework. However, finding the optimal solution in the general case is hard, limiting the applicability of recently developed algorithms. This paper provides a practical approach for solving decentralized control problems when communication among the decision makers is possible, but costly. We develop the notion of communication-based mechanism that allows us to decompose a decentralized MDP into multiple single-agent problems. In this framework, referred to as decentralized semi-Markov decision process with direct communication (Dec-SMDP-Com), agents operate separately between communications. We show that finding an optimal mechanism is equivalent to solving optimally a Dec-SMDP-Com. We also provide a heuristic search algorithm that converges on the optimal decomposition. Restricting the decomposition to some specific types of local behaviors reduces significantly the complexity of planning. In particular, we present a polynomial-time algorithm for the case in which individual agents perform goal-oriented behaviors between communications. The paper concludes with an additional tractable algorithm that enables the introduction of human knowledge, thereby reducing the overall problem to finding the best time to communicate. Empirical results show that these approaches provide good approximate solutions.

agent, communication, mechanism, (17 more...)

doi: 10.1613/jair.2466

AI Access Foundation

10547

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > Richmond County > New York City (0.04)
(15 more...)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Journal of Artificial Intelligence ResearchMar-11-2008

Global Inference for Sentence Compression: An Integer Linear Programming Approach

Clarke, J., Lapata, M.

Sentence compression holds promise for many applications ranging from summarization to subtitle generation. Our work views sentence compression as an optimization problem and uses integer linear programming (ILP) to infer globally optimal compressions in the presence of linguistically motivated constraints. We show how previous formulations of sentence compression can be recast as ILPs and extend these models with novel global constraints. Experimental results on written and spoken texts demonstrate improvements over state-of-the-art models.

compression, constraint, source sentence, (14 more...)

doi: 10.1613/jair.2433

AI Access Foundation

10537

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Africa > Uganda (0.05)
North America > Nicaragua (0.05)
(23 more...)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Gomes, Carla P., Sabharwal, Ashish, Selman, Bart

Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

We propose a new technique for sampling the solutions of combinatorial problems in a near-uniform manner. We focus on problems specified as a Boolean formula, i.e., on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations.

constraint, xor constraint, xorsample, (15 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Ghavamzadeh, Mohammad, Engel, Yaakov

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

algorithm, fisher information matrix, gradient, (15 more...)

Country:

North America > United States > California > Alameda County > Hayward (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.

iLSTD: Eligibility Traces and Convergence Analysis

In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computational advantages over LSTD.

algorithm, ilstd, selection mechanism, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gomes, Carla P., Sabharwal, Ashish, Selman, Bart

Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints

We propose a new technique for sampling the solutions of combinatorial problems in a near-uniform manner. We focus on problems specified as a Boolean formula, i.e., on SAT instances. Sampling for SAT problems has been shown to have interesting connections with probabilistic reasoning, making practical sampling algorithms for SAT highly desirable. The best current approaches are based on Markov Chain Monte Carlo methods, which have some practical limitations.

constraint, xor constraint, xorsample, (15 more...)

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Ghavamzadeh, Mohammad, Engel, Yaakov

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

algorithm, fisher information matrix, gradient, (15 more...)

Country:

North America > United States > California > Alameda County > Hayward (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.

iLSTD: Eligibility Traces and Convergence Analysis

In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computational advantages over LSTD.

algorithm, ilstd, selection mechanism, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ghavamzadeh, Mohammad, Engel, Yaakov

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policyby following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

algorithm, fisher information matrix, gradient, (15 more...)