AITopics

doi: 10.1613/jair.2628

AI Access Foundation

10580

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Ryabko, Daniil, Hutter, Marcus

On the Possibility of Learning in Reactive Environments with Arbitrary Dependence

arXiv.org Artificial IntelligenceOct-31-2008

We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions, i.e. environments more general than (PO)MDPs. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

0810.5636

Country:

Europe > Poland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Hutter, Marcus, Legg, Shane

Temporal Difference Updating without a Learning Rate

arXiv.org Artificial IntelligenceOct-31-2008

We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar to the standard equation for temporal difference learning with eligibility traces, so called TD(lambda), however it lacks the parameter alpha that specifies the learning rate. In the place of this free parameter there is now an equation for the learning rate that is specific to each state transition. We experimentally test this new learning rule against TD(lambda) and find that it offers superior performance in various settings. Finally, we make some preliminary investigations into how to extend our new temporal difference algorithm to reinforcement learning. To do this we combine our update equation with both Watkins' Q(lambda) and Sarsa(lambda) and find that it again offers superior performance without a learning rate parameter.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

0810.5631

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2008

Quantum reinforcement learning

Dong, Daoyi, Chen, Chunlin, Li, Hanxiong, Tarn, Tzyh-Jong

The key approaches for machine learning, especially learning in unknown probabilistic environments are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of value updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is parallelly updated according to rewards. Some related characteristics of QRL such as convergence, optimality and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speed up learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given and the results demonstrate the effectiveness and superiority of QRL algorithm for some complex problems. The present work is also an effective exploration on the application of quantum computation to artificial intelligence.

algorithm, fuzzy logic, upstream oil & gas, (16 more...)

doi: 10.1109/TSMCB.2008.925743

0810.3828

Country:

Asia > China (0.29)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas > Upstream (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.46)

Marivate, Vukosi N., Marwala, Tshilidzi

Social Learning Methods in Board Games

arXiv.org Artificial IntelligenceOct-20-2008

The training of agents in a social context instead of a self-play environment is investigated. Agents that use the reinforcement learning algorithms are trained in social settings. This mimics the way in which players of board games such as scrabble and chess mentor each other in their clubs. A Round Robin tournament and a modified Swiss tournament setting are used for the training. The agents trained using social settings are compared to self play agents and results indicate that more robust agents emerge from the social training setting. Higher state space games can benefit from such settings as diverse set of agents will have multiple strategies that increase the chances of obtaining more experienced players at the end of training. The Social Learning trained agents exhibit better playing experience than self play agents. The modified Swiss playing style spawns a larger number of better playing agents as the population size increases.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

0810.3474

Country:

Africa > South Africa > Gauteng > Johannesburg (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Szita, István, Lőrincz, András

The many faces of optimism - Extended version

arXiv.org Artificial IntelligenceOct-19-2008

The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods. Here, we integrate several concepts and obtain a fast and simple algorithm. We show that the proposed algorithm finds a near-optimal policy in polynomial time, and give experimental evidence that it is robust and efficient compared to its ascendants.

algorithm, artificial intelligence, upstream oil & gas, (19 more...)

0810.3451

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Szita, Istvan, Lorincz, Andras

Factored Value Iteration Converges

arXiv.org Artificial IntelligenceAug-13-2008

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we uniformly sample polynomially many samples from the (exponentially large) state space. This way, the complexity of our algorithm becomes polynomial in the size of the fMDP description length. We prove that the algorithm is convergent. We also derive an upper bound on the difference between our approximate solution and the optimal one, and also on the error introduced by sampling. We analyze various projection operators with respect to their computation complexity and their convergence when combined with approximate value iteration.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

0801.2069

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Dimitrakakis, Christos, Lagoudakis, Michail G.

Rollout Sampling Approximate Policy Iteration

arXiv.org Artificial IntelligenceJul-6-2008

Supervised and reinforcement learning are two well-known learning paradigms, which have been researched mostly independently. Recent studies have investigated the use of supervised learning methods for reinforcement learning, either for value function Lagoudakis and Parr(2003a); Riedmiller(2005) or policy representation Lagoudakis and Parr(2003b); Fern et al.(2004); Langford and Zadrozny (2005). Initial results have shown that policies can be approximately represented using either multi-class classifiers or combinations of binary classifiers Rexakis and Lagoudakis (2008) and, therefore, it is possible to incorporate classification algorithms within the inner loops of several reinforcement learning algorithms Lagoudakis and Parr (2003b); Fern et al. (2004). This viewpoint allows the quantification of the performance of reinforcement learning algorithms in terms of the performance of classification algorithms Langford and Zadrozny (2005). While a variety of promising combinations become possible through this synergy, heretofore there have been limited practical and widely-applicable algorithms. Our work builds on the work of Lagoudakis and Parr Lagoudakis and Parr (2003b) who suggested an approximate policy iteration algorithm for learning a good policy represented as a classifier, avoiding representations of any kind of value function. At each iteration, a new policy/classifier is produced using 1 training data obtained through extensive simulation (rollouts) of the previous policy on a generative model of the process. These rollouts aim at identifying better action choices over a subset of states in order to form a set of data for training the classifier representing the improved policy. A similar algorithm was proposed by Fern et al.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.1007/s10994-008-5069-3

0805.2027

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Csaji, B. C., Monostori, L.

Adaptive Stochastic Resource Control: A Machine Learning Approach

Journal of Artificial Intelligence ResearchJun-25-2008

The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, nu-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

doi: 10.1613/jair.2548

AI Access Foundation

10553

Country:

North America > United States > Massachusetts (0.28)
Europe > Hungary (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

A Self-Help Guide For Autonomous Systems

AI MagazineJun-15-2008

Humans learn from their mistakes. When things go badly, we notice that something is amiss, figure out what went wrong and why, and attempt to repair the problem. Artificial systems depend on their human designers to program in responses to every eventuality and therefore typically don’t even notice when things go wrong, following their programming over the proverbial, and in some cases literal, cliff. This article describes our past and current work on the Meta-Cognitive Loop, a domain-general approach to giving artificial systems the ability to notice, assess, and repair problems. The goal is to make artificial systems more robust and less dependent on their human designers.

machine learning, natural language, reinforcement learning, (18 more...)

AI Magazine

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)