valenzano
Linear Combination of Exponential Moving Averages for Wireless Channel Prediction
Formis, Gabriele, Scanzio, Stefano, Cena, Gianluca, Valenzano, Adriano
The ability to predict the behavior of a wireless channel in terms of the frame delivery ratio is quite valuable, and permits, e.g., to optimize the operating parameters of a wireless network at runtime, or to proactively react to the degradation of the channel quality, in order to meet the stringent requirements about dependability and end-to-end latency that typically characterize industrial applications. In this work, prediction models based on the exponential moving average (EMA) are investigated in depth, which are proven to outperform other simple statistical methods and whose performance is nearly as good as artificial neural networks, but with dramatically lower computational requirements. Regarding the innovation and motivation of this work, a new model that we called EMA linear combination (ELC), is introduced, explained, and evaluated experimentally. Its prediction accuracy, tested on some databases acquired from a real setup based on Wi-Fi devices, showed that ELC brings tangible improvements over EMA in any experimental conditions, the only drawback being a slight increase in computational complexity.
Predicting Wireless Channel Quality by means of Moving Averages and Regression Models
Formis, Gabriele, Scanzio, Stefano, Cena, Gianluca, Valenzano, Adriano
The ability to reliably predict the future quality of a wireless channel, as seen by the media access control layer, is a key enabler to improve performance of future industrial networks that do not rely on wires. Knowing in advance how much channel behavior may change can speed up procedures for adaptively selecting the best channel, making the network more deterministic, reliable, and less energy-hungry, possibly improving device roaming capabilities at the same time. To this aim, popular approaches based on moving averages and regression were compared, using multiple key performance indicators, on data captured from a real Wi-Fi setup. Moreover, a simple technique based on a linear combination of outcomes from different techniques was presented and analyzed, to further reduce the prediction error, and some considerations about lower bounds on achievable errors have been reported. We found that the best model is the exponential moving average, which managed to predict the frame delivery ratio with a 2.10\% average error and, at the same time, has lower computational complexity and memory consumption than the other models we analyzed.
Valenzano
The pancake puzzle is a standard benchmark domain used to test search algorithms, and the gap heuristic is the state-of-the-art heuristic function most often used in such tests. In this work, we analyze the accuracy of this heuristic and identify ways to enhance it. We begin by showing that in the worst-case, the amount that the gap heuristic underestimates the optimal cost of a pancake puzzle state can be linear in the number of pancakes in the stack. However, empirical analysis suggests that it is extremely rare that the gap heuristic underestimates the optimal cost by more than two. We then identify several simple methods that can be used to generate large sets of problems on which the gap heuristic underestimates the optimal cost by a larger amount than it typically does on random permutations. In doing so, we provide new pancake puzzle test sets that can be used to evaluate how search algorithms behave when the heuristic is inaccurate. We also formally characterize states according to the size of the heuristic plateaus around them. This characterization allows us to efficiently compute a two-step look ahead of the gap heuristic on any state, which we can use alongside a state's dual to further improve heuristic accuracy. These enhancements substantially improve the performance of an IDA*-based pancake problem solver on both the existing benchmarks and the new ones proposed in this paper.
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Toro Icarte, Rodrigo (University of Toronto and Vector Institute) | Klassen, Toryn Q. (University of Toronto and Vector Institute) | Valenzano, Richard (Element AI) | McIlraith, Sheila A. (University of Toronto and Vector Institute)
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro, Klassen, Toryn Q., Valenzano, Richard, McIlraith, Sheila A.
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to treat reward functions as white boxes instead -- to show the reward function's code to the RL agent so it can exploit its internal structures to learn optimal policies faster. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines (RMs), a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit such structures, including automated reward shaping, task decomposition, and counterfactual reasoning for data augmentation. Experiments on tabular and continuous domains show the benefits of exploiting reward structure across different tasks and RL agents.