AITopics

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Neural Information Processing SystemsApr-6-2023, 18:06:32 GMT

Approximate Solutions to Optimal Stopping Problems

We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible ape(cid:173) riodic Markov chain. The scheme involves the use of linear com(cid:173) binations of fixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving sim(cid:173) ulation of the underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with prob(cid:173) ability 1) and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearning(cid:173) like algorithm when combined with arbitrary linear function ap(cid:173) proximators to solve a sequential decision problem.

approximate solution, cid, optimal stopping problem, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Even-dar, Eyal, Mansour, Yishay

Convergence of Optimistic and Incremental Q-Learning

Neural Information Processing SystemsDec-31-2002

The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.

converge, q-iearning, q-learning, (15 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Even-dar, Eyal, Mansour, Yishay

Convergence of Optimistic and Incremental Q-Learning

Neural Information Processing SystemsDec-31-2002

The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.

converge, q-iearning, q-learning, (15 more...)

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Even-dar, Eyal, Mansour, Yishay

Convergence of Optimistic and Incremental Q-Learning

Neural Information Processing SystemsDec-31-2002

The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an E optimal policy. The second is a new and novel algorithm incremental Q-learning,which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm canbe viewed as derandomization of the E-greedy Q-learning. 1 Introduction One of the challenges of Reinforcement Learning is learning in an unknown environment.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: Asia > Middle East > Israel (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation of either the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.

algorithm, sarsa, trajectory, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Andre, David, Russell, Stuart J.

Programmable Reinforcement Learning Agents

We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1 Introduction The field of reinforcement learning has recently adopted the idea that the application of prior knowledge may allow much faster learning and may indeed be essential if realworld environments are to be addressed. For learning behaviors, the most obvious form of prior knowledge provides a partial description of desired behaviors. Several languages for partial descriptions have been proposed, including Hierarchical Abstract Machines (HAMs) [8], semi-Markov options [12], and the MAXQ framework [4]. This paper describes extensions to the HAM language that substantially increase its expressive power, using constructs borrowed from programming languages. Obviously, increasing expressiveness makes it easier for the user to supply whatever prior knowledge is available, and to do so more concisely.

choice point, pham, subroutine, (16 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation of either the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.

algorithm, sarsa, trajectory, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Andre, David, Russell, Stuart J.

Programmable Reinforcement Learning Agents

We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1 Introduction The field of reinforcement learning has recently adopted the idea that the application of prior knowledge may allow much faster learning and may indeed be essential if realworld environments are to be addressed. For learning behaviors, the most obvious form of prior knowledge provides a partial description of desired behaviors. Several languages for partial descriptions have been proposed, including Hierarchical Abstract Machines (HAMs) [8], semi-Markov options [12], and the MAXQ framework [4]. This paper describes extensions to the HAM language that substantially increase its expressive power, using constructs borrowed from programming languages. Obviously, increasing expressiveness makes it easier for the user to supply whatever prior knowledge is available, and to do so more concisely.

choice point, pham, subroutine, (16 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Reinforcement Learning with Function Approximation Converges to a Region

Gordon, Geoffrey J.

Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are counterexamples showing that the adjustable weights in some algorithms may oscillate within a region rather than converging to a point. This paper shows that, for two popular algorithms, such oscillation is the worst that can happen: the weights cannot diverge, but instead must converge to a bounded region. The algorithms are SARSA(O) and V(O); the latter algorithm was used in the well-known TD-Gammon program. 1 Introduction Although there are convergent online algorithms (such as TD()') [1]) for learning the parameters of a linear approximation to the value function of a Markov process, no way is known to extend these convergence proofs to the task of online approximation ofeither the state-value (V*) or the action-value (Q*) function of a general Markov decision process. In fact, there are known counterexamples to many proposed algorithms.For example, fitted value iteration can diverge even for Markov processes [2]; Q-Iearning with linear function approximators can diverge, even when the states are updated according to a fixed update policy [3]; and SARSA(O) can oscillate between multiple policies with different value functions [4].

artificial intelligence, machine learning, reinforcement learning, (16 more...)