AITopics

doi: 10.1613/jair.806

1106.0665

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Makino, Takaki, Takei, Shunsuke, Sato, Issei, Mochihashi, Daichi

Restricted Collapsed Draw: Accurate Sampling for Hierarchical Chinese Restaurant Process Hidden Markov Models

arXiv.org Machine LearningJun-2-2011

We propose a restricted collapsed draw (RCD) sampler, a general Markov chain Monte Carlo sampler of simultaneous draws from a hierarchical Chinese restaurant process (HCRP) with restriction. Models that require simultaneous draws from a hierarchical Dirichlet process with restriction, such as infinite Hidden markov models (iHMM), were difficult to enjoy benefits of \markerg{the} HCRP due to combinatorial explosion in calculating distributions of coupled draws. By constructing a proposal of seating arrangements (partitioning) and stochastically accepts the proposal by the Metropolis-Hastings algorithm, the RCD sampler makes accurate sampling for complex combination of draws while retaining efficiency of HCRP representation. Based on the RCD sampler, we developed a series of sophisticated sampling algorithms for iHMMs, including blocked Gibbs sampling, beam sampling, and split-merge sampling, that outperformed conventional iHMM samplers in experiments

artificial intelligence, machine learning, sampler, (14 more...)

arXiv.org Machine Learning

1106.0474

Genre: Research Report (0.40)

Industry: Consumer Products & Services > Restaurants (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

Walker, M. A.

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.

machine learning, natural language, reinforcement learning, (17 more...)

doi: 10.1613/jair.713

1106.0241

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes

Zhang, N. L., Zhang, W.

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

artificial intelligence, machine learning, vector, (16 more...)

doi: 10.1613/jair.761

1106.0251

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Nonapproximability Results for Partially Observable Markov Decision Processes

Goldsmith, J., Lusena, C., Mundhenk, M.

Here \unlikely" means \unless some complexity classes collapse," where the collapses considered are P NP, P PSPACE, or P EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and ecient computation. In this work, we show that uncertainty breeds uncertainty: In a controlled stochastic system with uncertainty (as modeled by a partially observable Markov decision process, for example), plans can be obtained eciently or with quality guarantees, but rarely both. Planning over stochastic domains with uncertainty is hard (in some cases PSPACEhard or even undecidable, see Papadimitriou & Tsitsiklis, 1987; Madani, Hanks, & Condon, 1999). Given that it is hard to nd an optimal plan or policy, it is natural to try to nd one that is \good enough". In the best of all possible worlds, this would mean having an algorithm that is guaranteed to be fast and to produce a policy that is reasonably close to the optimal policy.

artificial intelligence, machine learning, pomdp, (16 more...)

doi: 10.1613/jair.714

1106.0242

Country: North America > United States > California (0.45)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Value-Function Approximations for Partially Observable Markov Decision Processes

Hauskrecht, M.

Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.

artificial intelligence, linear function, machine learning, (15 more...)

doi: 10.1613/jair.678

1106.0234

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Argamon-Engelson, S., Dagan, I.

Committee-Based Sample Selection for Probabilistic Classifiers

In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.

machine learning, natural language, selection, (20 more...)

doi: 10.1613/jair.612

1106.022

Country:

Asia > Middle East (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceMay-26-2011

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

Boutilier, C., Dean, T., Hanks, S.

Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to describe performance criteria, in the functions used to describe state transitions and observations, and in the relationships among features used to describe states, actions, rewards, and observations. Specialized representations, and algorithms employing these representations, can achieve computational leverage by exploiting these various forms of structure. Certain AI techniques -- in particular those based on the use of structured, intensional representations -- can be viewed in this way. This paper surveys several types of representations for both classical and decision-theoretic planning problems, and planning algorithms that exploit these representations in a number of different ways to ease the computational burden of constructing policies or plans. It focuses primarily on abstraction, aggregation and decomposition techniques based on AI-style representations.

artificial intelligence, machine learning, representation, (20 more...)

doi: 10.1613/jair.575

1105.546

Country:

Europe (1.00)
North America > Canada (0.92)
North America > United States > California (0.92)
North America > United States > Massachusetts > Middlesex County (0.45)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gast, Nicolas, Gaujal, Bruno, Boudec, Jean-Yves Le

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

arXiv.org Artificial IntelligenceMay-19-2011

We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.

artificial intelligence, machine learning, optimization problem, (14 more...)

1004.2342

Country: Europe > France (0.28)

Genre: Research Report (0.63)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

AAAI ConferencesMay-18-2011

Sample-Based Planning for Continuous Action Markov Decision Processes

Mansley, Chris (Rutgers University) | Weinstein, Ari (Rutgers University) | Littman, Michael (Rutgers University)

In this paper, we present a new algorithm that integrates recent advances in solving continuous bandit problems with sample-based rollout methods for planning in Markov Decision Processes (MDPs). Our algorithm, Hierarchical Optimistic Optimization applied to Trees (HOOT) addresses planning in continuous-action MDPs. Empirical results are given that show that the performance of our algorithm meets or exceeds that of a similar discrete action planner by eliminating the problem of manual discretization of the action space.

action space, algorithm, discretization, (15 more...)

AAAI Conferences

Twenty-First International Conference on Automated Planning and Scheduling

Country: North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.47)