AITopics | Singh, Satinder P.

Plotting

Singh, Satinder P.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reward Design via Online Gradient Ascent

Sorg, Jonathan, Lewis, Richard L., Singh, Satinder P.

Neural Information Processing SystemsFeb-15-2020, 03:27:17 GMT

Recent work has demonstrated that when artificial agents are limited in their ability to achieve their goals, the agent designer can benefit by making the agent's goals different from the designer's. This gives rise to the optimization problem of designing the artificial agent's goals---in the RL framework, designing the agent's reward function. Existing attempts at solving this optimal reward problem do not leverage experience gained online during the agent's lifetime nor do they take advantage of knowledge about the agent's structure. In this work, we develop a gradient ascent approach with formal convergence guarantees for approximately solving the optimal reward problem online during an agent's lifetime. We show that our method generalizes a standard policy gradient approach, and we demonstrate its ability to improve reward functions in agents with various forms of limitations.

agent, artificial intelligence, online gradient ascent, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.52)

Add feedback

Reward Design via Online Gradient Ascent

Sorg, Jonathan, Lewis, Richard L., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2010

artificial intelligence, reinforcement learning, reward function, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Simple Local Models for Complex Dynamical Systems

Talvitie, Erik, Singh, Satinder P.

Neural Information Processing SystemsDec-31-2009

We present a novel mathematical formalism for the idea of a local model,'' a model of a potentially complex dynamical system that makes only certain predictions in only certain situations. As a result of its restricted responsibilities, a local model may be far simpler than a complete model of the system. We then show how one might combine several local models to produce a more detailed model. We demonstrate our ability to learn a collection of local models on a large-scale example and do a preliminary empirical comparison of learning a collection of local models and some other model learning methods."

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Off-policy Learning with Options and Recognizers

Precup, Doina, Paduraru, Cosmin, Koop, Anna, Sutton, Richard S., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2006

We introduce a new algorithm for off-policy temporal-difference learning withfunction approximation that has lower variance and requires less knowledge of the behavior policy than prior methods. We develop the notion ofa recognizer, a filter on actions that distorts the behavior policy to produce a related target policy with low-variance importance-sampling corrections. We also consider target policies that are deviations from the state distribution of the behavior policy, such as potential temporally abstract options, which further reduces variance. This paper introduces recognizers and their potential advantages, then develops a full algorithm for linear function approximation and proves that its updates are in the same direction as on-policy TD updates, which implies asymptotic convergence. Eventhough our algorithm is based on importance sampling, we prove that it requires absolutely no knowledge of the behavior policy for the case of state-aggregation function approximators.

artificial intelligence, recognizer, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.29)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > Canada > Quebec > Montreal (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intrinsically Motivated Reinforcement Learning

Chentanez, Nuttapong, Barto, Andrew G., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2005

Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy.

agent, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximately Efficient Online Mechanism Design

Parkes, David C., Yanovsky, Dimah, Singh, Satinder P.

Neural Information Processing SystemsDec-31-2005

Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-basedapproach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility thatagents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement ɛ- efficient policies, and retain truth-revelation as an approximate Bayesian-Nash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Intrinsically Motivated Reinforcement Learning

Chentanez, Nuttapong, Barto, Andrew G., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2005

Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities ableto efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agentsto construct and extend hierarchies of reusable skills that are needed for competent autonomy.

agent, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Nonlinear Predictive State Representation

Rudary, Matthew R., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2004

Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation--in particular, its potential to be exponentially larger than the equivalent POMDP.

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

An MDP-Based Approach to Online Mechanism Design

Parkes, David C., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2004

Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium.

artificial intelligence, machine learning, mechanism, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry: Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

An MDP-Based Approach to Online Mechanism Design

Parkes, David C., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2004

Online mechanism design (MD) considers the problem of providing incentivesto implement desired system-wide outcomes in systems withself-interested agents that arrive and depart dynamically. Agentscan choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm valueof the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium.

artificial intelligence, machine learning, mechanism, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry: Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback