AITopics | Singh, Satinder

Markov Decision Processes with Continuous Side Information

Modi, Aditya, Jiang, Nan, Singh, Satinder, Tewari, Ambuj

arXiv.org Machine LearningNov-15-2017

We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP for that episode. Our setting is motivated by applications in healthcare where baseline measurements of a patient at the start of a treatment episode form the context that may provide information about how the patient might respond to treatment decisions. We propose algorithms for learning in such Contextual Markov Decision Processes (CMDPs) under an assumption that the unobserved MDP parameters vary smoothly with the observed context. We also give lower and upper PAC bounds under the smoothness assumption. Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs. For the linear setting, we give a PAC learning algorithm based on KWIK learning techniques.

artificial intelligence, health & medicine, mdp, (19 more...)

arXiv.org Machine Learning

1711.05726

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Repeated Inverse Reinforcement Learning

Amin, Kareem, Jiang, Nan, Singh, Satinder

arXiv.org Artificial IntelligenceNov-3-2017

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted. Each time the human is surprised, the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results.

agent, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1705.05427

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes

Zhang, Shun (University of Michigan) | Durfee, Edmund (University of Michigan) | Singh, Satinder (University of Michigan)

AAAI ConferencesJun-14-2017

When planning actions to take on behalf of its human operator, a robot might be uncertain about its operator's reward function. We address the problem of how the robot should formulate an (approximately) optimal query to pose to the operator, given how its uncertainty affects which policies it should plan to pursue. We explain how a robot whose queries ask the operator to choose the best from among k choices can, without loss of optimality, restrict consideration to choices only over alternative policies. Further, we present a method for constructing an approximately-optimal policy query that enjoys a performance bound, where the method need not enumerate all policies. Finally, because queries posed to the operator of a robotic system are often expressed in terms of preferences over trajectories rather than policies, we show how our constructed policy query can be projected into the space of trajectory queries. Our empirical results demonstrate that our projection technique can outperform other known techniques for choosing trajectory queries, particularly when the number of trajectories the operator is asked to compare is small.

Add feedback

Improving Predictive State Representations via Gradient Descent

Jiang, Nan (University of Michigan) | Kulesza, Alex (University of Michigan) | Singh, Satinder (University of Michigan)

AAAI ConferencesApr-19-2016

Predictive state representations (PSRs) model dynamical systems using appropriately chosen predictions about future observations as a representation of the current state. In contrast to the hidden states posited by HMMs or RNNs, PSR states are directly observable in the training data; this gives rise to a moment-matching spectral algorithm for learning PSRs that is computationally efficient and statistically consistent when the model complexity matches that of the true system generating the data. In practice, however, model mismatch is inevitable and while spectral learning remains appealingly fast and simple it may fail to find optimal models. To address this problem, we investigate the use of gradient methods for improving spectrally-learned PSRs. We show that only a small amount of additional gradient optimization can lead to significant performance gains, and moreover that initializing gradient methods with the spectral learning solution yields better models in significantly less time than starting from scratch.

Add feedback

Action-Conditional Video Prediction using Deep Networks in Atari Games

Oh, Junhyuk, Guo, Xiaoxiao, Lee, Honglak, Lewis, Richard L., Singh, Satinder

Neural Information Processing SystemsDec-31-2015

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability. We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.

computer game, deep learning, prediction, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Action-Conditional Video Prediction using Deep Networks in Atari Games

Oh, Junhyuk, Guo, Xiaoxiao, Lee, Honglak, Lewis, Richard, Singh, Satinder

arXiv.org Artificial IntelligenceDec-21-2015

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability. We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.

computer game, deep learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

1507.0875

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Commitment Semantics for Sequential Decision Making Under Reward Uncertainty

Durfee, Edmund H. (University of Michigan) | Singh, Satinder (University of Michigan)

AAAI ConferencesNov-1-2015

A commitment represents an agent's intention to attempt to bring about some state of the world that is desired by some agent (possibly itself) in the future. Thus, by making a commitment, an agent is agreeing to make sequential decisions that it believes can cause the desired state to arise. In general, though, an agent's actions will have uncertain outcomes, and thus reaching the desired state cannot be guaranteed. For such sequential decision settings with uncertainty, therefore, commitments can only be probabilistic. We argue that standard notions of commitment are insufficient for probabilistic commitments, and propose a new semantics that judges commitment fulfillment not in terms of whether the agent achieved the desired state, but rather in terms of whether the agent made sequential decisions that in expectation would have achieved the desired state with (at least) the promised probability. We have devised various algorithms that operationalize our semantics, to capture problem contexts with probabilistic commitments arising because action outcomes are uncertain, as well as arising because an agent might realize over time that it does not want to fulfill the commitment.

agent, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Spectral Learning of Predictive State Representations with Insufficient Statistics

Kulesza, Alex (University of Michigan) | Jiang, Nan (University of Michigan) | Singh, Satinder (University of Michigan)

AAAI ConferencesMar-6-2015

Predictive state representations (PSRs) are models of dynamical systems that represent state as a vector of predictions about future observable events (tests) conditioned on past observed events (histories). If a practitioner selects finite sets of tests and histories that are known to be sufficient to completely capture the system, an exact PSR can be learned in polynomial time using spectral methods. However, most real-world systems are complex, and in practice computational constraints limit us to small sets of tests and histories which are therefore never truly sufficient. How, then, should we choose these sets? Existing theory offers little guidance here, and yet we show that the choice is highly consequential -- tests and histories selected at random or by a naive rule significantly underperform the best sets. In this paper we approach the problem both theoretically and empirically. While any fixed system can be represented by an infinite number of equivalent but distinct PSRs, we show that in the computationally unconstrained setting, where existing theory guarantees accurate predictions, the PSRs learned by spectral methods always satisfy a particular spectral bound. Adapting this idea, we propose a simple algorithmic technique to search for sets of tests and histories that approximately satisfy the bound while respecting computational limits. Empirically, our method significantly reduces prediction errors compared to standard spectral learning approaches.

artificial intelligence, machine learning, test and history, (14 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Guo, Xiaoxiao, Singh, Satinder, Lee, Honglak, Lewis, Richard L., Wang, Xiaoshi

Neural Information Processing SystemsDec-31-2014

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement learning with deep learning, called DQN, achieves the best real-time agents thus far. Planning-based approaches achieve far higher scores than the best model-free approaches, but they exploit information that is not available to human players, and they are orders of magnitude slower than needed for real-time play. Our main goal in this work is to build a better real-time Atari game playing agent than DQN. The central idea is to use the slow planning-based agents to provide training data for a deep-learning architecture capable of real-time play. We proposed new agents based on this idea and show that they outperform DQN.

agent, computer game, deep learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization

Esbroeck, Alex Van (University of Michigan) | Singh, Satinder (University of Michigan) | Rubinfeld, Ilan (Henry Ford Hospital) | Syed, Zeeshan (University of Michigan)

AAAI ConferencesJul-14-2014

Missing values are a common problem when applying classification algorithms to real-world medical data. This is especially true for trauma patients, where the emergent nature of the cases makes it difficult to collect all of the relevant data for each patient. Standard methods for handling missingness first learn a model to estimate missing data values, and subsequently train and evaluate a classifier using data imputed with this model. Recently, several proposed methods have demonstrated the benefits of jointly estimating the imputation model and classifier parameters. However, these methods make assumptions that limit their utility with many real-world medical datasets. For example, the assumption that data elements are missing at random is often invalid. We address this situation by exploring a novel approach for jointly learning the imputation model and classifier. Unlike previous algorithms, our approach makes no assumptions about the missingness of the data, can be used with arbitrary probabilistic data models and classification loss functions, and can be used when both the training and testing data have missing values. We investigate the utility of this approach on the prediction of several patient outcomes in a large national registry of trauma patients, and find that it significantly outperforms standard sequential methods.

health & medicine, optimization, optimization problem, (19 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

Singh, Satinder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Markov Decision Processes with Continuous Side Information

Repeated Inverse Reinforcement Learning

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes

Improving Predictive State Representations via Gradient Descent

Action-Conditional Video Prediction using Deep Networks in Atari Games

Action-Conditional Video Prediction using Deep Networks in Atari Games

Commitment Semantics for Sequential Decision Making Under Reward Uncertainty

Spectral Learning of Predictive State Representations with Insufficient Statistics

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization