Search
Real-time heuristic search
We apply the two-player game assumptions of limited search horizon and commitment to moves in constant time, to single-agent heuristic search problems. We present a variation of minimax lookahead search, and an analog to alpha-beta pruning that significantly improves the efficiency of the algorithm. In addition, we present a new algorithm, called Real-Time-Aโ, for interleaving planning and execution. We prove that the algorithm makes locally optimal decisions and is guaranteed to find a solution. We also present a learning version of this algorithm that improves its performance over successive problem solving trials by learning more accurate heuristic values, and prove that the learned values converge to their exact values along every optimal path.
On optimal game-tree search using rational meta-reasoning
In this paper we outline a general approach to the study of problem-solving, in which search steps are considered decisions in the same sense as actions in the world. Unlike other metrics in the literature, the value of a search step is defined as a real utility rather than as a quasiutility, and can therefore be computed directly from a model of the base-level problem-solver. We develop a formula for the expected value of a search step in a game-playing context using the single-step assumption, namely that a computation step can be evaluated as it was the last to be taken. We prove some metalevel theorems that enable the development of a low-overhead algorithm, MGSS*, that chooses search steps in order of highest estimated utility. Although we show that the single-step assumption is untenable in general, a program implemented for the game of Othello soundly beats an alpha-beta search while expanding significantly fewer nodes, even though both programs use the same evaluation function.
Heuristic search in restricted memory
Chakrabarti, P. P. | Ghose, S. | Acharya, A. | de Sarkar, S. C.
This paper presents heuristic search algorithms which work within memory constraints. These algorithms, MAโ (for ordinary graphs) and MAOโ (for AND/OR graphs) guarantee admissible solutions within specified memory limitations (above the minimum required). The memory versus node expansions tradeoff is analyzed for the worst case. In the case of ordinary graphs, some experiments using the Fifteen Puzzle problem are carried out under various pruning conditions. These parameterized algorithms are found to encompass a wide class of best first search algorithms.
Divide and conquer under global constraints: A solution to the N-queens problem
Configuring N mutually nonattacking queens on an N-by-N chessboard is a classical problem that was first posed over a century ago. Over the past few decades, this problem has become important to computer scientists by serving as the standard example of a globally constrained problem which is solvable using backtracking search methods. A related problem, placing the N queens on a toroidal board, has been discussed in detail by Poyla and Chandra. Their work focused on characterizing the solvable cases and finding solutions which arrange the queens in a regular pattern. This paper describes a new divide-and-conquer algorithm that solves both problems and investigates the relationship between them.
LEARNING BY STATE RECURRENCE DETECTION
Rosen, Bruce E., Goodwin, James M., Vidal, Jacques J.
LEARNING BY ST ATE RECURRENCE DETECfION Bruce E. Rosen, James M. Goodwint, and Jacques J. Vidal University of California, Los Angeles, Ca. 90024 ABSTRACT This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement.
LEARNING BY STATE RECURRENCE DETECTION
Rosen, Bruce E., Goodwin, James M., Vidal, Jacques J.
LEARNING BY ST ATE RECURRENCE DETECfION Bruce E. Rosen, James M. Goodwint, and Jacques J. Vidal University of California, Los Angeles, Ca. 90024 ABSTRACT This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement.
LEARNING BY STATE RECURRENCE DETECTION
Rosen, Bruce E., Goodwin, James M., Vidal, Jacques J.
The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrencelearning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate.
Foundations and Grand Challenges of Artificial Intelligence: AAAI Presidential Address
AAAI is a society devoted to supporting the progress in science, technology and applications of AI. I thought I would use this occasion to share with you some of my thoughts on the recent advances in AI, the insights and theoretical foundations that have emerged out of the past thirty years of stable, sustained, systematic explorations in our field, and the grand challenges motivating the research in our field.