Goto

Collaborating Authors

 Kudenko, Daniel


Sustainable broadcasting in Blockchain Networks with Reinforcement Learning

arXiv.org Artificial Intelligence

Recent estimates put the carbon footprint of Bitcoin and Ethereum at an average of 64 and 26 million tonnes of CO2 per year, respectively. To address this growing problem, several possible approaches have been proposed in the literature: creating alternative blockchain consensus mechanisms, applying redundancy reduction techniques, utilizing renewable energy sources, and employing energy-efficient devices, etc. In this paper, we follow the second avenue and propose an efficient approach based on reinforcement learning that improves the block broadcasting scheme in blockchain networks. The analysis and experimental results confirmed that the proposed improvement of the block propagation scheme could cleverly handle network dynamics and achieve better results than the default approach. Additionally, our technical integration of the simulator and developed RL environment can be used as a complete solution for further study of new schemes and protocols that use RL or other ML techniques.


Improving the Effectiveness of Potential-Based Reward Shaping in Reinforcement Learning

arXiv.org Artificial Intelligence

Potential-based reward shaping is commonly used to incorporate prior knowledge of how to solve the task into reinforcement learning because it can formally guarantee policy invariance. As such, the optimal policy and the ordering of policies by their returns are not altered by potential-based reward shaping. In this work, we highlight the dependence of effective potential-based reward shaping on the initial Q-values and external rewards, which determine the agent's ability to exploit the shaping rewards to guide its exploration and achieve increased sample efficiency. We formally derive how a simple linear shift of the potential function can be used to improve the effectiveness of reward shaping without changing the encoded preferences in the potential function, and without having to adjust the initial Q-values, which can be challenging and undesirable in deep reinforcement learning. We show the theoretical limitations of continuous potential functions for correctly assigning positive and negative reward shaping values. We verify our theoretical findings empirically on Gridworld domains with sparse and uninformative reward functions, as well as on the Cart Pole and Mountain Car environments, where we demonstrate the application of our results in deep reinforcement learning.


How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples

arXiv.org Artificial Intelligence

With an ever-increasing reliance on machine learning (ML) models in the real world, adversarial examples threaten the safety of AI-based systems such as autonomous vehicles. In the image domain, they represent maliciously perturbed data points that look benign to humans (i.e., the image modification is not noticeable) but greatly mislead state-of-the-art ML models. Previously, researchers ensured the imperceptibility of their altered data points by restricting perturbations via $\ell_p$ norms. However, recent publications claim that creating natural-looking adversarial examples without such restrictions is also possible. With much more freedom to instill malicious information into data, these unrestricted adversarial examples can potentially overcome traditional defense strategies as they are not constrained by the limitations or patterns these defenses typically recognize and mitigate. This allows attackers to operate outside of expected threat models. However, surveying existing image-based methods, we noticed a need for more human evaluations of the proposed image modifications. Based on existing human-assessment frameworks for image generation quality, we propose SCOOTER - an evaluation framework for unrestricted image-based attacks. It provides researchers with guidelines for conducting statistically significant human experiments, standardized questions, and a ready-to-use implementation. We propose a framework that allows researchers to analyze how imperceptible their unrestricted attacks truly are.


Graph-based State Representation for Deep Reinforcement Learning

arXiv.org Machine Learning

Deep RL approaches build much of their success on the ability of the deep neural network to generate useful internal representations. Nevertheless, they suffer from a high sample-complexity and starting with a good input representation can have a significant impact on the performance. In this paper, we exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. To this end we perform a comparative analysis of several models chosen from 4 different classes of representation learning algorithms for policy learning in grid-world navigation tasks, which are representative of a large class of RL problems. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases. Moreoever, graph convolution based methods are outperformed by simpler random walk based methods and graph linear autoencoders.


Curriculum Learning with a Progression Function

arXiv.org Machine Learning

Curriculum Learning for Reinforcement Learning is an increasingly popular technique that involves training an agent on a defined sequence of intermediate tasks, called a Curriculum, to increase the agent's performance and learning speed. This paper introduces a novel paradigm for automatic curriculum generation based on a progression of task complexity. Different progression functions are introduced, including an autonomous online task progression based on the performance of the agent. The progression function also determines how long the agent should train on each intermediate task, which is an open problem in other task-based curriculum approaches. The benefits and wide applicability of our approach are shown by empirically comparing its performance to two state-of-the-art Curriculum Learning algorithms on a grid world and on a complex simulated navigation domain.


Artificial Intelligence for Prosthetics - challenge solutions

arXiv.org Machine Learning

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.


Automated Refactoring of Object-Oriented Code Using Clustering Ensembles

AAAI Conferences

In this paper we are approaching the problem of automatic refactoring detection for object-oriented systems. An approach based on clustering ensembles is proposed, several heuristics to existing algorithms and to filtering and combining their results are discussed. An experimental validation of the proposed approach on an open source project is proposed. The obtained results illustrate that the introduced approach could be successfully used to improve existing integrated development environments, providing developers with one more tool to reduce complexity of their projects.


Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence

AAAI Conferences

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of only looking at a single one, can improve optimization. This class of problems is very relevant in reinforcement learning, as any single-objective reinforcement learning problem can be framed as such a multi-objective problem using multiple reward shaping functions. After discussing this problem class, we propose a solution technique for such reinforcement learning problems, called adaptive objective selection. This technique makes a temporal difference learner estimate the Q-function for each objective in parallel, and introduces a way of measuring confidence in these estimates. This confidence metric is then used to choose which objective's estimates to use for action selection. We show significant improvements in performance over other plausible techniques on two problem domains. Finally, we provide an intuitive analysis of the technique's decisions, yielding insights into the nature of the problems being solved.


Multiagent Router Throttling: Decentralized Coordinated Response Against DDoS Attacks

AAAI Conferences

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. In this paper we introduce Multiagent Router Throttling, a decentralized DDoS response mechanism in which a set of upstream routers independently learn to throttle traffic towards a victim server. We compare our approach against a baseline and a popular throttling technique from the literature, and we show that our proposed approach is more secure, reliable and cost-effective. Furthermore, our approach outperforms the baseline technique and either outperforms or has the same performance as the popular one.