AITopics | Tsitsiklis, John N.

Collaborating Authors

Tsitsiklis, John N.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of Temporal-Diffference Learning with Function Approximation

Tsitsiklis, John N., Roy, Benjamin Van

Neural Information Processing SystemsDec-31-1997

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control.

artificial intelligence, fuzzy logic, markov chain, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Add feedback

Analysis of Temporal-Diffference Learning with Function Approximation

Tsitsiklis, John N., Roy, Benjamin Van

Neural Information Processing SystemsDec-31-1997

The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.

artificial intelligence, fuzzy logic, markov chain, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)

Add feedback

Stable LInear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions

Roy, Benjamin Van, Tsitsiklis, John N.

Neural Information Processing SystemsDec-31-1996

Recently, however, there have been some successful applications of neural networks in a totally different context - that of sequential decision making under uncertainty (stochastic control). Stochastic control problems have been studied extensively in the operations research and control theory literature for a long time, using the methodology of dynamic programming [Bertsekas, 1995]. In dynamic programming, the most important object is the cost-to-go (or value) junction, which evaluates the expected future 1046 B. V. ROY, 1. N. TSITSIKLIS

algorithm, artificial intelligence, optimization problem, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology: