Goto

Collaborating Authors

 Markov Models


Point-Based Value Iteration for Constrained POMDPs

AAAI Conferences

Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.


DetH*: Approximate Hierarchical Solution of Large Markov Decision Processes

AAAI Conferences

This paper presents an algorithm for finding approximately optimal policies in very large Markov decision processes by constructing a hierarchical model and then solving it approximately. It exploits factored representations to achieve compactness and efficiency and to discover connectivity properties of the domain. We provide a bound on the quality of the solutions and give asymptotic analysis of the runtimes; in addition we demonstrate performance on a collection of very large domains. Results show that the quality of resulting policies is very good and the total running times, for both creating and solving the hierarchy, are significantly less than for an optimal factored MDP solver.


Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

AAAI Conferences

Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.


Collective Semantic Role Labeling for Tweets with Clustering

AAAI Conferences

As tweets has become a comprehensive repository of fresh information, Semantic Role Labeling (SRL) for tweets has aroused great research interests because of its center role in a wide range of tweet related studies such as fine-grained information extraction, sentiment analysis and summarization. However, the fact that a tweet is often too short and informal to provide sufficient information poses a main challenge. To tackle this challenge, we propose a new method to collectively label similar tweets. The underlying idea is to exploit similar tweets to make up for the lack of information in a tweet. Specifically, similar tweets are first grouped together by clustering. Then for each cluster a two-stage labeling is conducted: One labeler conducts SRL to get statistical information, such as the predicate/argument/role triples that occur frequently, from its highly confidently labeled results; then in the second stage, another labeler performs SRL with such statistical information to refine the results. Experimental results on a human annotated dataset show that our approach remarkably improves SRL by 3.1% F1.


Semantic Relationship Discovery with Wikipedia Structure

AAAI Conferences

Thanks to the idea of social collaboration, Wikipedia has accumulated vast amount of semi-structured knowledge in which the link structure reflects human's cognition on semantic relationship to some extent. In this paper, we proposed a novel method RCRank to jointly compute concept-concept relatedness and concept-category relatedness base on the assumption that information carried in concept-concept links and concept-category links can mutually reinforce each other. Different from previous work, RCRank can not only find semantically related concepts but also interpret their relations by categories. Experimental results on concept recommendation and relation interpretation show that our method substantially outperforms classical methods.


Visual Task Inference Using Hidden Markov Models

AAAI Conferences

It has been known for a long time that visual task, such as reading, counting and searching, greatly influences eye movement patterns. Perhaps the best known demonstration of this is the celebrated study of Yarbus showing that different eye movement trajectories emerge depending on the visual task that the viewers are given. The objective of this paper is to develop an inverse Yarbus process whereby we can infer the visual task by observing the measurements of a viewerโ€™s eye movements while executing the visual task. The method we are proposing is to use Hidden Markov Models (HMMs) to create a probabilistic framework to infer the viewerโ€™s task from eye movements.


Bayesian Policy Search with Policy Priors

AAAI Conferences

We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm's search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm---our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.


Learning Driving Behavior by Timed Syntactic Pattern Recognition

AAAI Conferences

The data at our disposal consists of onboard sensor measurements that have been collected from truck round-trips. We advocate the use of an explicit time representation By applying a simple discretization method, we obtain sequences in syntactic pattern recognition because it can of timed events. The behavior that is displayed in result in more succinct models and easier learning these sequences is unknown. From this data, we want to learn problems. We apply this approach to the real-world a model that we can use to monitor the driving behavior in problem of learning models for the driving behavior new data, i.e., to use it as a classifier. Our approach is to first of truck drivers. We discretize the values of learn a timed model from the unlabeled sequences using the onboard sensors into simple events.


Strategy Learning for Autonomous Agents in Smart Grid Markets

AAAI Conferences

Distributed electricity producers, such as small wind farms and solar installations, pose several technical and economic challenges in Smart Grid design. One approach to addressing these challenges is through Broker Agents who buy electricity from distributed producers, and also sell electricity to consumers, via a Tariff Market--a new market mechanism where Broker Agents publish concurrent bid and ask prices. We investigate the learning of pricing strategies for an autonomous Broker Agent to profitably participate in a Tariff Market. We employ Markov Decision Processes (MDPs) and reinforcement learning. An important concern with this method is that even simple representations of the problem domain result in very large numbers of states in the MDP formulation because market prices can take nearly arbitrary real values. In this paper, we present the use of derived state space features, computed using statistics on Tariff Market prices and Broker Agent customer portfolios, to obtain a scalable state representation. We also contribute a set of pricing tactics that form building blocks in the learned Broker Agent strategy. We further present a Tariff Market simulation model based on real-world data and anticipated market dynamics. We use this model to obtain experimental results that show the learned strategy performing vastly better than a random strategy and significantly better than two other non-learning strategies.


Agent-Oriented Incremental Team and Activity Recognition

AAAI Conferences

Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.