AITopics | Littman, Michael L.

Collaborating Authors

Littman, Michael L.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Behavior-Aware Model Learning from Human-Generated Trajectories

Loftin, Robert Tyler (North Carolina State University) | MacGlashan, James (Brown University) | Peng, Bei (Washington State University) | Taylor, Matthew E. (Washington State University) | Littman, Michael L. (Brown University) | Roberts, David L. (North Carolina State University)

AAAI ConferencesNov-19-2016

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAM is highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.

behavior-aware model learning, human-generated trajectory

AAAI Conferences

2016 AAAI Fall Symposium Series

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning as a Framework for Ethical Decision Making

Abel, David (Brown University) | MacGlashan, James (Brown University) | Littman, Michael L. (Brown University)

AAAI ConferencesApr-12-2016

Emerging AI systems will be making more and more decisions that impact the lives of humans in a significant way. It is essential, then, that these AI systems make decisions that take into account the desires, goals, and preferences of other people, while simultaneously learning about what those preferences are. In this work, we argue that the reinforcement-learning framework achieves the appropriate generality required to theorize about an idealized ethical artificial agent, and offers the proper foundations for grounding specific questions about ethical learning and decision making that can promote further scientific investigation. We define an idealized formalism for an ethical learner, and conduct experiments on two toy ethical dilemmas, demonstrating the soundness and flexibility of our approach. Lastly, we identify several critical challenges for future advancement in the area that can leverage our proposed framework.

Add feedback

A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback

Loftin, Robert Tyler (North Carolina State University) | MacGlashan, James (Brown University) | Peng, Bei (Washington State University) | Taylor, Matthew E. (Washinton State University) | Littman, Michael L. (Brown University) | Huang, Jeff (Brown University) | Roberts, David L. (North Carolina State University)

AAAI ConferencesJul-14-2014

This paper introduces two novel algorithms for learning behaviors from human-provided rewards. The primary novelty of these algorithms is that instead of treating the feedback as a numeric reward signal, they interpret feedback as a form of discrete communication that depends on both the behavior the trainer is trying to teach and the teaching strategy used by the trainer. For example, some human trainers use a lack of feedback to indicate whether actions are correct or incorrect, and interpreting this lack of feedback accurately can significantly improve learning speed. Results from user studies show that humans use a variety of training strategies in practice and both algorithms can learn a contextual bandit task faster than algorithms that treat the feedback as numeric. Simulated trainers are also employed to evaluate the algorithms in both contextual bandit and sequential decision-making tasks with similar results.

artificial intelligence, reinforcement learning, trainer, (20 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.68)

Genre:

Questionnaire & Opinion Survey (0.58)
Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

The Complexity of Plan Existence and Evaluation in Probabilistic Domains

Goldsmith, Judy, Littman, Michael L., Mundhenk, Martin

arXiv.org Artificial IntelligenceFeb-6-2013

We examine the computational complexity of testing and finding small plans in probabilistic planning domains with succinct representations. We find that many problems of interest are complete for a variety of complexity classes: NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. Of these, the probabilistic classes PP and NP^PP are likely to be of special interest in the field of uncertainty in artificial intelligence and are deserving of additional study. These results suggest a fruitful direction of future algorithmic development.

artificial intelligence, planning & scheduling, representation, (17 more...)

arXiv.org Artificial Intelligence

1302.154

Country: North America > United States > Kentucky > Fayette County > Lexington (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes

Cassandra, Anthony R., Littman, Michael L., Zhang, Nevin Lianwen

arXiv.org Artificial IntelligenceFeb-6-2013

Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPs.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1302.1525

Country:

Asia > China > Hong Kong (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Incremental Model-based Learners With Formal Learning-Time Guarantees

Strehl, Alexander L., Li, Lihong, Littman, Michael L.

arXiv.org Artificial IntelligenceJun-27-2012

Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands.

algorithm, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1206.687

Country: North America > United States > Massachusetts > Middlesex County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

CORL: A Continuous-state Offset-dynamics Reinforcement Learner

Brunskill, Emma, Leffler, Bethany, Li, Lihong, Littman, Michael L., Roy, Nicholas

arXiv.org Machine LearningJun-13-2012

Continuous state spaces and stochastic, switching dynamics characterize a number of rich, realworld domains, such as robot navigation across varying terrain. We describe a reinforcementlearning algorithm for learning in these domains and prove for certain environments the algorithm is probably approximately correct with a sample complexity that scales polynomially with the state-space dimension. Unfortunately, no optimal planning techniques exist in general for such problems; instead we use fitted value iteration to solve the learned MDP, and include the error due to approximate planning in our bounds. Finally, we report an experiment using a robotic car driving over varying terrain to demonstrate that these dynamics representations adequately capture real-world dynamics and that our algorithm can be used to efficiently solve such problems.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1206.3231

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Weinstein, Ari (Rutgers University) | Littman, Michael L. (Rutgers University)

AAAI ConferencesJun-8-2012

Recent research leverages results from the continuous-armed bandit literature to create a reinforcement-learning algorithm for continuous state and action spaces. Initially proposed in a theoretical setting, we provide the first examination of the empirical properties of the algorithm. Through experimentation, we demonstrate the effectiveness of this planning method when coupled with exploration and model learning and show that, in addition to its formal guarantees, the approach is very competitive with other continuous-action reinforcement learners.

algorithm, optimization problem, planning & scheduling, (19 more...)

AAAI Conferences

Twenty-Second International Conference on Automated Planning and Scheduling

Country: North America > United States (0.28)

Genre:

Workflow (0.68)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

Exploring compact reinforcement-learning representations with linear regression

Walsh, Thomas J., Szita, Istvan, Diuk, Carlos, Littman, Michael L.

arXiv.org Artificial IntelligenceMay-9-2012

This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

artificial intelligence, probability, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1205.2606

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

Asmuth, John, Littman, Michael L.

arXiv.org Artificial IntelligenceFeb-14-2012

Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in the use of Monte-Carlo tree search (MCTS) have shown that it is possible to act near-optimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayes-optimal behavior in an unknown MDP is equivalent to optimal behavior in the known belief-space MDP, although the size of this belief-space MDP grows exponentially with the amount of history retained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayes-optimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.

algorithm, planning & scheduling, survey article, (19 more...)

arXiv.org Artificial Intelligence

1202.3699

Country:

North America > United States > Virginia (0.14)
North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback