AITopics | Nota, Chris

Collaborating Authors

Nota, Chris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Convergence of Discounted Policy Gradient Methods

Nota, Chris

arXiv.org Artificial IntelligenceJan-9-2023

Policy gradient methods are a class of reinforcement learning (RL) algorithms that attempt to directly maximize the expected performance of an agent's policy by following the gradient of an objective function (Sutton et al., 2000), typically the expected sum of rewards, using a stochastic estimator generated by interacting with the environment. Unbiased estimators of this gradient can suffer from high variance due to high variance in the sum of future rewards. A common approach is to instead consider an exponentially discounted sum of future rewards. This approach reduces the variance of most estimators but introduces bias (Thomas, 2014). Frequently, the discounted sum of future rewards is estimated by a critic (Konda and Tsitsiklis, 2000). It has been argued that when a critic is used, discounting has the additional benefit of reducing approximation error (Zhang et al., 2020). The "discounted" policy gradient was originally introduced as the gradient of a discounted objective (Sutton et al., 2000). However, it has been shown that the gradient of the discounted objective does not produce the update direction followed by most discounted policy gradient algorithms (Thomas, 2014; Nota and Thomas, 2019).

approximation, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2212.14066

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Is the Policy Gradient a Gradient?

Nota, Chris, Thomas, Philip S.

arXiv.org Machine LearningJun-17-2019

The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods do not use the discount factor in the manner originally prescribed, and therefore do not optimize the discounted objective. It has been an open question in RL as to which, if any, objective they optimize instead. We show that the direction followed by these methods is not the gradient of any objective, and reclassify them as semi-gradient methods with respect to the undiscounted objective. Further, we show that they are not guaranteed to converge to a locally optimal policy, and construct an counterexample where they will converge to the globally pessimal policy with respect to both the discounted and undiscounted objectives.

artificial intelligence, objective, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1906.07073

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Thomas, Philip S., Jordan, Scott M., Chandak, Yash, Nota, Chris, Kostas, James

arXiv.org Machine LearningJun-6-2019

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

artificial intelligence, gradient, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1906.03063

Country:

North America > United States > Massachusetts (0.29)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Lifelong Learning with a Changing Action Set

Chandak, Yash, Theocharous, Georgios, Nota, Chris, Thomas, Philip S.

arXiv.org Machine LearningJun-4-2019

In many real-world sequential decision making problems, the number of available actions (decisions) can vary over time. While problems like catastrophic forgetting, changing transition dynamics, changing rewards functions, etc. have been well-studied in the lifelong learning literature, the setting where the action set changes remains unaddressed. In this paper, we present an algorithm that autonomously adapts to an action set whose size changes over time. To tackle this open problem, we break it into two problems that can be solved iteratively: inferring the underlying, unknown, structure in the space of actions and optimizing a policy that leverages this structure. We demonstrate the efficiency of this approach on large-scale real-world lifelong learning problems.

educational setting, neural network, new action, (19 more...)

arXiv.org Machine Learning

1906.0177

Country: North America > United States > Massachusetts (0.14)

Genre: Instructional Material (1.00)

Industry: Education > Educational Setting > Continuing Education (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reinforcement Learning Without Backpropagation or a Clock

Kostas, James, Nota, Chris, Thomas, Philip S.

arXiv.org Machine LearningFeb-18-2019

Reinforcement learning (RL) algorithms share qualitative similarities with the algorithms implemented byanimal brains. However, there remain clear differences between these two types of algorithms. For example, while RL algorithms using artificial neural networks require information to flow backwards through the network via the backpropagation algorithm, there is currently debate about whether this is feasible in biological neural implementations (Werbos and Davis, 2016). Policy gradient coagent networks (PGCNs) are a class of RL algorithms that were introduced to remove this possibly biologically implausible property of RL algorithms--they use artificial neural networks but do not use the backpropagation algorithm (Thomas, 2011). Since their introduction, PGCN algorithms have proven to be not only a possible improvement in biological plausibility, but a practical tool for improving RL agents.

health & medicine, neural network, pre, (21 more...)

arXiv.org Machine Learning

1902.0565

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.80)

Add feedback