Goto

Collaborating Authors

 Konen, Wolfgang


Putting the Iterative Training of Decision Trees to the Test on a Real-World Robotic Task

arXiv.org Artificial Intelligence

In previous research, we developed methods to train decision trees (DT) as agents for reinforcement learning tasks, based on deep reinforcement learning (DRL) networks. The samples from which the DTs are built, use the environment's state as features and the corresponding action as label. To solve the nontrivial task of selecting samples, which on one hand reflect the DRL agent's capabilities of choosing the right action but on the other hand also cover enough state space to generalize well, we developed an algorithm to iteratively train DTs. In this short paper, we apply this algorithm to a real-world implementation of a robotic task for the first time. Real-world tasks pose additional challenges compared to simulations, such as noise and delays. The task consists of a physical pendulum attached to a cart, which moves on a linear track. By movements to the left and to the right, the pendulum is to be swung in the upright position and balanced in the unstable equilibrium. Our results demonstrate the applicability of the algorithm to real-world tasks by generating a DT whose performance matches the performance of the DRL agent, while consisting of fewer parameters. This research could be a starting point for distilling DTs from DRL agents to obtain transparent, lightweight models for real-world reinforcement learning tasks.


Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison

arXiv.org Artificial Intelligence

Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.


Towards Learning Rubik's Cube with N-tuple-based Reinforcement Learning

arXiv.org Artificial Intelligence

This work describes in detail how to learn and solve the Rubik's cube game (or puzzle) in the General Board Game (GBG) learning and playing framework. We cover the cube sizes 2x2x2 and 3x3x3. We describe in detail the cube's state representation, how to transform it with twists, whole-cube rotations and color transformations and explain the use of symmetries in Rubik's cube. Next, we discuss different n-tuple representations for the cube, how we train the agents by reinforcement learning and how we improve the trained agents during evaluation by MCTS wrapping. We present results for agents that learn Rubik's cube from scratch, with and without MCTS wrapping, with and without symmetries and show that both, MCTS wrapping and symmetries, increase computational costs, but lead at the same time to much better results. We can solve the 2x2x2 cube completely, and the 3x3x3 cube in the majority of the cases for scrambled cubes up to p = 15 (QTM). We cannot yet reliably solve 3x3x3 cubes with more than 15 scrambling twists. Although our computational costs are higher with MCTS wrapping and with symmetries than without, they are still considerably lower than in the approaches of McAleer et al. (2018, 2019) and Agostinelli et al. (2019) who provide the best Rubik's cube learning agents so far.


Final Adaptation Reinforcement Learning for N-Player Games

arXiv.org Artificial Intelligence

This paper covers n-tuple-based reinforcement learning (RL) algorithms for games. We present new algorithms for TD-, SARSA- and Q-learning which work seamlessly on various games with arbitrary number of players. This is achieved by taking a player-centered view where each player propagates his/her rewards back to previous rounds. We add a new element called Final Adaptation RL (FARL) to all these algorithms. Our main contribution is that FARL is a vitally important ingredient to achieve success with the player-centered view in various games. We report results on seven board games with 1, 2 and 3 players, including Othello, ConnectFour and Hex. In most cases it is found that FARL is important to learn a near-perfect playing strategy. All algorithms are available in the GBG framework on GitHub.


General Board Game Playing for Education and Research in Generic AI Game Learning

arXiv.org Artificial Intelligence

We present a new general board game (GBG) playing and learning framework. GBG defines the common interfaces for board games, game states and their AI agents. It allows one to run competitions of different agents on different games. It standardizes those parts of board game playing and learning that otherwise would be tedious and repetitive parts in coding. GBG is suitable for arbitrary 1-, 2-, ..., N-player board games. It makes a generic TD($\lambda$)-n-tuple agent for the first time available to arbitrary games. On various games, TD($\lambda$)-n-tuple is found to be superior to other generic agents like MCTS. GBG aims at the educational perspective, where it helps students to start faster in the area of game learning. GBG aims as well at the research perspective by collecting a growing set of games and AI agents to assess their strengths and generalization capabilities in meaningful competitions. Initial successful educational and research results are reported.


SACOBRA with Online Whitening for Solving Optimization Problems with High Conditioning

arXiv.org Machine Learning

Real-world optimization problems often have expensive objective functions in terms of cost and time. It is desirable to find near-optimal solutions with very few function evaluations. Surrogate-assisted optimizers tend to reduce the required number of function evaluations by replacing the real function with an efficient mathematical model built on few evaluated points. Problems with a high condition number are a challenge for many surrogate-assisted optimizers including SACOBRA. To address such problems we propose a new online whitening operating in the black-box optimization paradigm. We show on a set of high-conditioning functions that online whitening tackles SACOBRA's early stagnation issue and reduces the optimization error by a factor between 10 to 1e12 as compared to the plain SACOBRA, though it imposes many extra function evaluations. Covariance matrix adaptation evolution strategy (CMA-ES) has for very high numbers of function evaluations even lower errors, whereas SACOBRA performs better in the expensive setting (less than 1e03 function evaluations). If we count all parallelizable function evaluations (population evaluation in CMA-ES, online whitening in our approach) as one iteration, then both algorithms have comparable strength even on the long run. This holds for problems with dimension D <= 20.


Solving the G-problems in less than 500 iterations: Improved efficient constrained optimization by surrogate modeling and adaptive parameter control

arXiv.org Machine Learning

Constrained optimization of high-dimensional numerical problems plays an important role in many scientific and industrial applications. Function evaluations in many industrial applications are severely limited and no analytical information about objective function and constraint functions is available. For such expensive black-box optimization tasks, the constraint optimization algorithm COBRA was proposed, making use of RBF surrogate modeling for both the objective and the constraint functions. COBRA has shown remarkable success in solving reliably complex benchmark problems in less than 500 function evaluations. Unfortunately, COBRA requires careful adjustment of parameters in order to do so. In this work we present a new self-adjusting algorithm SACOBRA, which is based on COBRA and capable to achieve high-quality results with very few function evaluations and no parameter tuning. It is shown with the help of performance profiles on a set of benchmark problems (G-problems, MOPTA08) that SACOBRA consistently outperforms any COBRA algorithm with fixed parameter setting. We analyze the importance of the several new elements in SACOBRA and find that each element of SACOBRA plays a role to boost up the overall optimization performance. We discuss the reasons behind and get in this way a better understanding of high-quality RBF surrogate modeling.


Self-configuration from a Machine-Learning Perspective

arXiv.org Machine Learning

The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detection of interesting patterns) or both. We discuss several aspects how self-configuration can help to alleviate these problems. One aspect is the self-configuration by tuning of algorithms, where recent advances have been made in the area of SPO (Sequen- tial Parameter Optimization). Another aspect is the self-configuration by pattern detection or feature construction. Forming multiple features (e.g. random boolean functions) and using algorithms (e.g. random forests) which easily digest many fea- tures can largely increase learning speed. However, a full-fledged theory of feature construction is not yet available and forms a current barrier in machine learning. We discuss several ideas for systematic inclusion of feature construction. This may lead to partly self-configuring machine learning solutions which show robustness, flexibility, and fast learning in potentially changing environments.


On the numeric stability of the SFA implementation sfa-tk

arXiv.org Machine Learning

Slow feature analysis (SFA) is a method for extracting slowly varying features from a quickly varying multidimensional signal. An open source Matlab-implementation sfa-tk makes SFA easily useable. We show here that under certain circumstances, namely when the covariance matrix of the nonlinearly expanded data does not have full rank, this implementation runs into numerical instabilities. We propse a modified algorithm based on singular value decomposition (SVD) which is free of those instabilities even in the case where the rank of the matrix is only less than 10% of its size. Furthermore we show that an alternative way of handling the numerical problems is to inject a small amount of noise into the multidimensional input signal which can restore a rank-deficient covariance matrix to full rank, however at the price of modifying the original data and the need for noise parameter tuning.


How slow is slow? SFA detects signals that are slower than the driving force

arXiv.org Machine Learning

Slow feature analysis (SFA) is a method for extracting slowly varying driving forces from quickly varying nonstationary time series. We show here that it is possible for SFA to detect a component which is even slower than the driving force itself (e.g. the envelope of a modulated sine wave). It is shown that it depends on circumstances like the embedding dimension, the time series predictability, or the base frequency, whether the driving force itself or a slower subcomponent is detected. We observe a phase transition from one regime to the other and it is the purpose of this work to quantify the influence of various parameters on this phase transition. We conclude that what is percieved as slow by SFA varies and that a more or less fast switching from one regime to the other occurs, perhaps showing some similarity to human perception.