Goto

Collaborating Authors

 Undirected Networks


Hierarchically Structured Scheduling and Execution of Tasks in a Multi-Agent Environment

arXiv.org Machine Learning

In a warehouse environment, tasks appear dynamically. Consequently, a task management system that matches them with the workforce too early (e.g., weeks in advance) is necessarily sub-optimal. Also, the rapidly increasing size of the action space of such a system consists of a significant problem for traditional schedulers. Reinforcement learning, however, is suited to deal with issues requiring making sequential decisions towards a long-term, often remote, goal. In this work, we set ourselves on a problem that presents itself with a hierarchical structure: the task-scheduling, by a centralised agent, in a dynamic warehouse multi-agent environment and the execution of one such schedule, by decentralised agents with only partial observability thereof. We propose to use deep reinforcement learning to solve both the high-level scheduling problem and the low-level multi-agent problem of schedule execution. Finally, we also conceive the case where centralisation is impossible at test time and workers must learn how to cooperate in executing the tasks in an environment with no schedule and only partial observability.


Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning

arXiv.org Machine Learning

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.


Top resources to learn reinforcement learning in 2022

#artificialintelligence

Rich S. Sutton, a research scientist at DeepMind and computing science professor at the University of Alberta, explains the underlying formal problem like the Markov decision processes, core solution methods, dynamic programming, Monte Carlo methods, and temporal-difference learning in this in-depth tutorial.


Hidden Markov Models Simply Explained

#artificialintelligence

In a regular Markov Chain we are able to see the states and their associated transition probabilities. However, in a Hidden Markov Model (HMM), the Markov Chain is hidden but we can infer its properties through its given observed states. Note: The Hidden Markov Model is not a Markov Chain per se, it is another model in the wider list of Markov Processes/Models. These associated probabilities of the observed states (Happy, Sad) are known as the emission probabilities. Now, lets say my friend wants to infer the weather from my mood.


Analysis and Assessment of Controllability of an Expressive Deep Learning-Based TTS System

#artificialintelligence

In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and the dimensions of the latent space representing expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown an interface for Controllable Expressive TTS and asked to retrieve a synthetic utterance whose expressiveness subjectively corresponds to that a reference utterance.


Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

arXiv.org Machine Learning

Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment. We consider the problem where the agents interact with the mechanism designer according to an unknown Markov Decision Process (MDP), where agent rewards and the mechanism designer's state evolve according to an episodic MDP with unknown reward functions and transition kernels. We focus on the online setting with linear function approximation and attempt to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction. A key contribution of our work is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space to estimate prices in the dynamic VCG mechanism. We show that the regret of our proposed method is upper bounded by $\tilde{\mathcal{O}}(T^{2/3})$ and further devise a lower bound to show that our algorithm is efficient, incurring the same $\tilde{\mathcal{O}}(T^{2 / 3})$ regret as the lower bound, where $T$ is the total number of rounds. Our work establishes the regret guarantee for online RL in solving dynamic mechanism design problems without prior knowledge of the underlying model.


Missing Value Knockoffs

arXiv.org Machine Learning

Coping with increasing number of variables, optimizing predictive performance, and selecting among candidate scientific hypothesis are all valid reasons for using a variable selection algorithm. Another reality of today's datasets are missing values. Although there are existing methods for handling the missing values if applied directly, they can interfere with the assumptions of variable selection algorithms. In this work, we will discuss how model-x knockoffs (Candes et al. 2017), a new approach in principled variable selection, can be applied to datasets that contain missing values. By principled variable selection we refer to algorithms that aims to identify the Markov Blanket (MB) of a response variable (Tsamardinos and Aliferis 2003) while providing a control of the false selections. Identifying the MB is by definition optimal as the MB refers to the smallest subset of variables that is sufficient to describe the conditional distribution of the response variable. Controlling the false selections refers to limiting the variables that are selected due to random chance and is especially important in applications where a selected variable corresponds to a scientific discovery. Model-x knockoffs provides a framework for repurposing existing statistical/machine learning feature scorers for MB discovery. When the assumptions of the model-x framework holds, the expected fraction of selections that are conditionally pairwise independent with the response variable is controlled.


Bayesian Deep Learning for Graphs

arXiv.org Machine Learning

The adaptive processing of structured data is a long-standing research topic in machine learning that investigates how to automatically learn a mapping from a structured input to outputs of various nature. Recently, there has been an increasing interest in the adaptive processing of graphs, which led to the development of different neural network-based methodologies. In this thesis, we take a different route and develop a Bayesian Deep Learning framework for graph learning. The dissertation begins with a review of the principles over which most of the methods in the field are built, followed by a study on graph classification reproducibility issues. We then proceed to bridge the basic ideas of deep learning for graphs with the Bayesian world, by building our deep architectures in an incremental fashion. This framework allows us to consider graphs with discrete and continuous edge features, producing unsupervised embeddings rich enough to reach the state of the art on several classification tasks. Our approach is also amenable to a Bayesian nonparametric extension that automatizes the choice of almost all model's hyper-parameters. Two real-world applications demonstrate the efficacy of deep learning for graphs. The first concerns the prediction of information-theoretic quantities for molecular simulations with supervised neural models. After that, we exploit our Bayesian models to solve a malware-classification task while being robust to intra-procedural code obfuscation techniques. We conclude the dissertation with an attempt to blend the best of the neural and Bayesian worlds together. The resulting hybrid model is able to predict multimodal distributions conditioned on input graphs, with the consequent ability to model stochasticity and uncertainty better than most works. Overall, we aim to provide a Bayesian perspective into the articulated research field of deep learning for graphs.


Mixed-Integer Nonlinear Programming for State-based Non-Intrusive Load Monitoring

arXiv.org Machine Learning

Energy disaggregation, known in the literature as Non-Intrusive Load Monitoring (NILM), is the task of inferring the energy consumption of each appliance given the aggregate signal recorded by a single smart meter. In this paper, we propose a novel two-stage optimization-based approach for energy disaggregation. In the first phase, a small training set consisting of disaggregated power profiles is used to estimate the parameters and the power states by solving a mixed integer programming problem. Once the model parameters are estimated, the energy disaggregation problem is formulated as a constrained binary quadratic optimization problem. We incorporate penalty terms that exploit prior knowledge on how the disaggregated traces are generated, and appliance-specific constraints characterizing the signature of different types of appliances operating simultaneously. Our approach is compared with existing optimization-based algorithms both on a synthetic dataset and on three real-world datasets. The proposed formulation is computationally efficient, able to disambiguate loads with similar consumption patterns, and successfully reconstruct the signatures of known appliances despite the presence of unmetered devices, thus overcoming the main drawbacks of the optimization-based methods available in the literature.


Natural Language Processing

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. The recommendation systems (RS) are becoming an integral part of our daily lives. This means that we can obtain what we desire either through internet-accessible applications or on social media channels. Traditional views of the recommendation problem refer to it as a simple classification or prediction problem; however, recently new evidence indicates that it is essentially a sequential problem[1]. It can therefore be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to resolve it [1]. RL algorithms play a crucial role as these algorithms are very advantageous to cope with the dynamic environment and large space [4]. Deep Reinforcement Learning (DRL), have enabled RL to be applied to the recommendation problem with massive states and action spaces. RL-based and DRL-based methods in a classified manner based on the specific RL algorithm, like Q-learning, SARSA, and REINFORCE, that is used to optimize the recommendation policy[2].