AITopics | Littman, Michael L.

Collaborating Authors

Littman, Michael L.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Allen, Cameron, Kirtland, Aaron, Tao, Ruo Yu, Lobel, Sam, Scott, Daniel, Petrocelli, Nicholas, Gottesman, Omer, Parr, Ronald, Littman, Michael L., Konidaris, George

arXiv.org Machine LearningJul-9-2024

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to--or knowledge of--an underlying, unobservable state space. Our metric, the $\lambda$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($\lambda$) with a different value of $\lambda$. Since TD($\lambda$=0) makes an implicit Markov assumption and TD($\lambda$=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $\lambda$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $\lambda$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $\lambda$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2407.07333

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Zuo, Max, Velez, Francisco Piedrahita, Li, Xiaochen, Littman, Michael L., Bach, Stephen H.

arXiv.org Artificial IntelligenceJul-3-2024

Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First, generated PDDL code is typically evaluated using planning validators that check whether the problem can be solved with a planner. This method is insufficient because a language model might generate valid PDDL code that does not align with the natural language description of the task. Second, existing evaluation sets often have natural language descriptions of the planning task that closely resemble the ground truth PDDL, reducing the challenge of the task. To bridge this gap, we introduce \benchmarkName, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. We begin by creating a PDDL equivalence algorithm that rigorously evaluates the correctness of PDDL code generated by language models by flexibly comparing it against a ground truth PDDL. Then, we present a dataset of $132,037$ text-to-PDDL pairs across 13 different tasks, with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, $87.6\%$ of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, $82.2\%$ are valid, solve-able problems, but only $35.1\%$ are semantically correct, highlighting the need for a more rigorous benchmark for this problem.

machine learning, natural language, planning problem, (21 more...)

arXiv.org Artificial Intelligence

2407.03321

Country:

Europe > United Kingdom > England (0.14)
Europe > Spain (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Interactive Learning from Policy-Dependent Human Feedback

MacGlashan, James, Ho, Mark K, Loftin, Robert, Peng, Bei, Wang, Guan, Roberts, David, Taylor, Matthew E., Littman, Michael L.

arXiv.org Artificial IntelligenceJan-28-2023

This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false -- whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. Based on this insight, we introduce {\em Convergent Actor-Critic by Humans} (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1701.06049

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Overview (0.86)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Baker, Megan M., New, Alexander, Aguilar-Simon, Mario, Al-Halah, Ziad, Arnold, Sébastien M. R., Ben-Iwhiwhu, Ese, Brna, Andrew P., Brooks, Ethan, Brown, Ryan C., Daniels, Zachary, Daram, Anurag, Delattre, Fabien, Dellana, Ryan, Eaton, Eric, Fu, Haotian, Grauman, Kristen, Hostetler, Jesse, Iqbal, Shariq, Kent, Cassandra, Ketz, Nicholas, Kolouri, Soheil, Konidaris, George, Kudithipudi, Dhireesha, Learned-Miller, Erik, Lee, Seungwon, Littman, Michael L., Madireddy, Sandeep, Mendez, Jorge A., Nguyen, Eric Q., Piatko, Christine D., Pilly, Praveen K., Raghavan, Aswin, Rahman, Abrar, Ramakrishnan, Santhosh Kumar, Ratzlaff, Neale, Soltoggio, Andrea, Stone, Peter, Sur, Indranil, Tang, Zhipeng, Tiwari, Saket, Vedder, Kyle, Wang, Felix, Xu, Zifan, Yanguas-Gil, Angel, Yedidsion, Harel, Yu, Shangqun, Vallabha, Gautam K.

arXiv.org Artificial IntelligenceJan-18-2023

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neunet.2023.01.007

2301.07799

Country: North America > United States > California (0.92)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Continuing Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)

Add feedback

Specifying Behavior Preference with Tiered Reward Functions

Zhou, Zhiyuan, Sowerby, Henry, Littman, Michael L.

arXiv.org Artificial IntelligenceDec-7-2022

Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our contribution to the learning process is through designing the reward function. Like programmers, we have a behavior in mind and have to translate it into a formal specification, namely rewards. In this work, we consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we propose a strict partial ordering of the policy space. We prefer policies that reach the good states faster and with higher probability while avoiding the bad states longer. Next, we propose an environment-independent tiered reward structure and show it is guaranteed to induce policies that are Pareto-optimal according to our preference relation. Finally, we empirically evaluate tiered reward functions on several environments and show they induce desired behavior and lead to fast learning.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2212.03733

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

Lovering, Charles, Forde, Jessica Zosa, Konidaris, George, Pavlick, Ellie, Littman, Michael L.

arXiv.org Artificial IntelligenceNov-26-2022

AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We investigate AlphaZero's internal representations in the game of Hex using two evaluation techniques from natural language processing (NLP): model probing and behavioral tests. In doing so, we introduce new evaluation tools to the RL community, and illustrate how evaluations other than task performance can be used to provide a more complete picture of a model's strengths and weaknesses. Our analyses in the game of Hex reveal interesting patterns and generate some testable hypotheses about how such models learn in general. For example, we find that MCTS discovers concepts before the neural network learns to encode them. We also find that concepts related to short-term end-game planning are best encoded in the final layers of the model, whereas concepts related to long-term planning are encoded in the middle layers of the model.

alphazero, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.14673

Country:

Europe (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Chess (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) 2021 Study Panel Report

Littman, Michael L., Ajunwa, Ifeoma, Berger, Guy, Boutilier, Craig, Currie, Morgan, Doshi-Velez, Finale, Hadfield, Gillian, Horowitz, Michael C., Isbell, Charles, Kitano, Hiroaki, Levy, Karen, Lyons, Terah, Mitchell, Melanie, Shah, Julie, Sloman, Steven, Vallor, Shannon, Walsh, Toby

arXiv.org Artificial IntelligenceOct-27-2022

In September 2021, the "One Hundred Year Study on Artificial Intelligence" project (AI100) issued the second report of its planned long-term periodic assessment of artificial intelligence (AI) and its impact on society. It was written by a panel of 17 study authors, each of whom is deeply rooted in AI research, chaired by Michael Littman of Brown University. The report, entitled "Gathering Strength, Gathering Storms," answers a set of 14 questions probing critical areas of AI development addressing the major risks and dangers of AI, its effects on society, its public perception and the future of the field. The report concludes that AI has made a major leap from the lab to people's lives in recent years, which increases the urgency to understand its potential negative effects. The questions were developed by the AI100 Standing Committee, chaired by Peter Stone of the University of Texas at Austin, consisting of a group of AI leaders with expertise in computer science, sociology, ethics, economics, and other disciplines.

artificial intelligence, chatbot, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2210.15767

Country:

Asia (1.00)
North America > United States > California (0.46)
Europe > United Kingdom > England (0.45)
North America > United States > Texas > Travis County > Austin (0.24)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Personal (1.00)
Instructional Material (1.00)

Industry:

Transportation > Ground > Road (1.00)
Media > News (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(28 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
(10 more...)

Add feedback

Deep Q-Network with Proximal Iteration

Asadi, Kavosh, Fakoor, Rasool, Gottesman, Omer, Littman, Michael L., Smola, Alexander J.

arXiv.org Artificial IntelligenceDec-10-2021

We employ Proximal Iteration for value-function optimization in reinforcement learning. Proximal Iteration is a computationally efficient technique that enables us to bias the optimization procedure towards more desirable solutions. As a concrete application of Proximal Iteration in deep reinforcement learning, we endow the objective function of the Deep Q-Network (DQN) agent with a proximal term to ensure that the online-network component of DQN remains in the vicinity of the target network. The resultant agent, which we call DQN with Proximal Iteration, or DQNPro, exhibits significant improvements over the original DQN on the Atari benchmark. Our results accentuate the power of employing sound optimization techniques for deep reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2112.05848

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Expressivity of Markov Reward

Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder

arXiv.org Artificial IntelligenceNov-1-2021

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2111.00876

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Abel, David, Allen, Cameron, Arumugam, Dilip, Hershkowitz, D. Ellis, Littman, Michael L., Wong, Lawson L. S.

arXiv.org Artificial IntelligenceOct-7-2021

Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired threshold in value. We prove that this simple quantity has many properties one would expect of a measure of learning hardness. Further, we prove it is NP-hard to compute the measure in general, but there are paths to polynomial-time approximation. We conclude by summarizing potential directions and uses for this measure.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2110.03424

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback