AITopics | tesauro

Collaborating Authors

tesauro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On-line Policy Improvement using Monte-Carlo Search

Tesauro, Gerald, Galperin, Gregory R.

arXiv.org Artificial IntelligenceJan-9-2025

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.

machine learning, monte-carlo player, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2501.05407

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Backgammon (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Asymptotic Convergence of Backpropagation: Numerical Experiments

Neural Information Processing SystemsApr-6-2023, 19:49:24 GMT

We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning al(cid:173) gorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic cor(cid:173) rections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the con(cid:173) vergence rate in numerical simulations, which confirm our analytic results.

asymptotic convergence, backpropagation, numerical experiment, (4 more...)

Neural Information Processing Systems

Country: North America > United States > Ohio > Franklin County > Columbus (0.11)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.67)

Add feedback

A Comparison of Contextual and Non-Contextual Preference Ranking for Set Addition Problems

Bertram, Timo, Fürnkranz, Johannes, Müller, Martin

arXiv.org Artificial IntelligenceJul-9-2021

In this paper, we study the problem of evaluating the addition of elements to a set. This problem is difficult, because it can, in the general case, not be reduced to unconditional preferences between the choices. Therefore, we model preferences based on the context of the decision. We discuss and compare two different Siamese network architectures for this task: a twin network that compares the two sets resulting after the addition, and a triplet network that models the contribution of each candidate to the existing set. We evaluate the two settings on a real-world task; learning human card preferences for deck building in the collectible card game Magic: The Gathering. We show that the triplet approach achieves a better result than the twin network and that both outperform previous results on this task.

evaluation, preference ranking, ranking, (15 more...)

arXiv.org Artificial Intelligence

2107.04438

Country:

North America > Puerto Rico > San Juan > San Juan (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Standing on the shoulders of giants

#artificialintelligenceSep-18-2019, 20:16:25 GMT

When you think of AI or machine learning you may draw up images of AlphaZero or even some science fiction reference such as HAL-9000 from 2001: A Space Odyssey. However, the true forefather, who set the stage for all of this, was the great Arthur Samuel. Samuel was a computer scientist, visionary, and pioneer, who wrote the first checkers program for the IBM 701 in the early 1950s. His program, "Samuel's Checkers Program", was first shown to the general public on TV on February 24th, 1956, and the impact was so powerful that IBM stock went up 15 points overnight (a huge jump at that time). This program also helped set the stage for all the modern chess programs we have come to know so well, with features like look-ahead, an evaluation function, and a mini-max search that he would later develop into alpha-beta pruning.

alphazero, deepmind, neural network, (14 more...)

#artificialintelligence

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > Dutchess County > Poughkeepsie (0.04)

Industry:

Leisure & Entertainment > Games > Chess (0.77)
Leisure & Entertainment > Games > Checkers (0.69)
Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Why it matters that AI is better than humans at games like Jeopardy - Watson

#artificialintelligenceAug-20-2017, 00:50:09 GMT

Try Watson's AI-powered APIs for free For many people, the first time they ever heard about artificial intelligence and IBM Watson was when it played Jeopardy! While Watson made a few mistakes on its way to victory, it cemented its reputation well enough that many articles about Watson still describe it as the artificial intelligence system that played Jeopardy! But even before Watson competed on Jeopardy!, AI systems also learned games, ranging from tic-tac-toe to chess. You may recall that in 1997, IBM's Deep Blue beat the world's chess champion Garry Kasparov. And since Jeopardy!, AI systems like Watson have continued to learn to play other games, ranging from the ancient game of Go to Texas Hold'em poker.

machine learning, question answering, watson, (17 more...)

#artificialintelligence

Country: North America > United States > Texas (0.25)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.71)
Information Technology > Artificial Intelligence > Games > Chess (0.55)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.36)

Add feedback

Analysis of Watson's Strategies for Playing Jeopardy!

Tesauro, G., Gondek, D. C., Lenchner, J., Fan, J., Prager, J. M.

Journal of Artificial Intelligence ResearchMay-31-2013

Major advances in Question Answering technology were needed for IBM Watson to play Jeopardy! at championship level -- the show requires rapid-fire answers to challenging natural language questions, broad general knowledge, high precision, and accurate confidence estimates. In addition, Jeopardy! features four types of decision making carrying great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3) selecting the next square when in control of the board; (4) deciding whether to attempt to answer, i.e., "buzz in." Using sophisticated strategies for these decisions, that properly account for the game state and future event probabilities, can significantly boost a player's overall chances to win, when compared with simple "rule of thumb" strategies. This article presents our approach to developing Watson's game-playing strategies, comprising development of a faithful simulation model, and then using learning and Monte-Carlo methods within the simulator to optimize Watson's strategic decision-making. After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulator's predictions, and documenting performance improvements using our methods. Quantitative performance benefits are shown with respect to both simple heuristic strategies, and actual human contestant performance in historical episodes. We further extend our analysis of human play to derive a number of valuable and counterintuitive examples illustrating how human contestants may improve their performance on the show.

contestant, jeopardy, watson, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3834

AI Access Foundation

10818

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Austria > Vienna (0.04)

Genre:

Research Report (0.46)
Contests & Prizes (0.34)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Jeopardy! (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
(2 more...)

Add feedback

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex

arXiv.org Artificial IntelligenceJan-4-1999

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(lambda) and another less radical variant, TD-directed(lambda). In particular, our chess program, ``KnightCap,'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). It improved from a 1650 rating to a 2100 rating in just 308 games. We discuss some of the reasons for this success and the relationship between our results and Tesauro's results in backgammon.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

cs/9901001

Country: Oceania > Australia (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

On-line Policy Improvement using Monte-Carlo Search

Tesauro, Gerald, Galperin, Gregory R.

Neural Information Processing SystemsDec-31-1997

Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach to policy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/or policies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.

algorithm, base player, monte-carlo player, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Industry: Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Why did TD-Gammon Work?

Pollack, Jordan B., Blair, Alan D.

Neural Information Processing SystemsDec-31-1997

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neural network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal difference learning in the game of Backgammon (Tesauro, 1992). After all, the dream of computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961).

backgammon, challenger, tesauro, (14 more...)

Neural Information Processing Systems

Country: