Undirected Networks
Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph World Models
In this paper, we present a hypergraph--based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm. Together, these form a goal agnostic automated planning engine for an autonomous learning agent which incorporates beneficial properties of both classical Machine Learning and traditional Artificial Intelligence. We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time yielding explicit predictions for learning curves, goal achievement rates, and response to abstractions and uncertainty. To validate performance, we exhibit results from applying the agent to three archetypal planning problems, including composite hierarchical domains, and highlight empirical findings which illustrate properties elucidated in the analysis.
Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning
Mohammed, Shareefuddin, Bealer, Rusty, Cohen, Jason
In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the right features from a dataset, which can be a challenge for nuanced financial planning scenarios. Reinforcement learning is a machine learning approach that can be employed with complex data sets where picking the right features can be nearly impossible. In this paper, we will explore the use of machine learning for financial forecasting, predicting economic indicators, and creating a savings strategy. Vanguard ML algorithm for goals-based financial planning is based on deep reinforcement learning that identifies optimal savings rates across multiple goals and sources of income to help clients achieve financial success. Vanguard learning algorithms are trained to identify market indicators and behaviors too complex to capture with formulas and rules, instead, it works to model the financial success trajectory of investors and their investment outcomes as a Markov decision process. We believe that reinforcement learning can be used to create value for advisors and end-investors, creating efficiency, more personalized plans, and data to enable customized solutions.
Relational Neural Markov Random Fields
Chen, Yuqiao, Natarajan, Sriraam, Ruozzi, Nicholas
Statistical Relational Learning (SRL) models have attracted significant attention due to their ability to model complex data while handling uncertainty. However, most of these models have been limited to discrete domains due to their limited potential functions. We introduce Relational Neural Markov Random Fields (RN-MRFs) which allow for handling of complex relational hybrid domains. The key advantage of our model is that it makes minimal data distributional assumptions and can seamlessly allow for human knowledge through potentials or relational rules. We propose a maximum pseudolikelihood estimation-based learning algorithm with importance sampling for training the neural potential parameters. Our empirical evaluations across diverse domains such as image processing and relational object mapping, clearly demonstrate its effectiveness against non-neural counterparts.
Lifting DecPOMDPs for Nanoscale Systems -- A Work in Progress
Braun, Tanya, Fischer, Stefan, Lau, Florian, Möller, Ralf
DNA-based nanonetworks have a wide range of promising use cases, especially in the field of medicine. With a large set of agents, a partially observable stochastic environment, and noisy observations, such nanoscale systems can be modelled as a decentralised, partially observable, Markov decision process (DecPOMDP). As the agent set is a dominating factor, this paper presents (i) lifted DecPOMDPs, partitioning the agent set into sets of indistinguishable agents, reducing the worst-case space required, and (ii) a nanoscale medical system as an application. Future work turns to solving and implementing lifted DecPOMDPs.
An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem
Monaci, Marta, Agasucci, Valerio, Grani, Giorgio
There is a growing interest in integrating machine learning techniques and optimization to solve challenging optimization problems. In this work, we propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP). The aim is to build up a greedy-like heuristic able to learn on some distribution of JSSP instances, different in the number of jobs and machines. The need for fast scheduling methods is well known, and it arises in many areas, from transportation to healthcare. We model the JSSP as a Markov Decision Process and then we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures are adapted to take into account the challenging nature of JSSP, where the state and the action space change not only for every instance but also after each decision. To tackle the variability in the number of jobs and operations in the input, we modeled the agent using two incident LSTM models, a special type of deep neural network. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. Benchmarks have been generated in comparison with the commercial solver CPLEX. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
Xiao, Yuchen, Lyu, Xueguang, Amato, Christopher
Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
Learning Cooperation and Online Planning Through Simulation and Graph Convolutional Network
Mahmud, Rafid Ameer, Faisal, Fahim, Mahmud, Saaduddin, Khan, Md. Mosaddek
Multi-agent Markov Decision Process (MMDP) has been an effective way of modelling sequential decision making algorithms for multi-agent cooperative environments. A number of algorithms based on centralized and decentralized planning have been developed in this domain. However, dynamically changing environment, coupled with exponential size of the state and joint action space, make it difficult for these algorithms to provide both efficiency and scalability. Recently, Centralized planning algorithm FV-MCTS-MP and decentralized planning algorithm \textit{Alternate maximization with Behavioural Cloning} (ABC) have achieved notable performance in solving MMDPs. However, they are not capable of adapting to dynamically changing environments and accounting for the lack of communication among agents, respectively. Against this background, we introduce a simulation based online planning algorithm, that we call SiCLOP, for multi-agent cooperative environments. Specifically, SiCLOP tailors Monte Carlo Tree Search (MCTS) and uses Coordination Graph (CG) and Graph Neural Network (GCN) to learn cooperation and provides real time solution of a MMDP problem. It also improves scalability through an effective pruning of action space. Additionally, unlike FV-MCTS-MP and ABC, SiCLOP supports transfer learning, which enables learned agents to operate in different environments. We also provide theoretical discussion about the convergence property of our algorithm within the context of multi-agent settings. Finally, our extensive empirical results show that SiCLOP significantly outperforms the state-of-the-art online planning algorithms.
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches \[ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). \] In complementary, we also prove a per-instance information-theoretical lower bound under the weak assumption that $d^\mu_h(s_h,a_h)>0$ if $d^{\pi^\star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the per-instance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d_h^\mu$ is the marginal state-action probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the assumption-free regime (where we make no assumption on $ \mu$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
Physics-guided Deep Markov Models for Learning Nonlinear Dynamical Systems with Uncertainty
Liu, Wei, Lai, Zhilu, Bacsa, Kiran, Chatzi, Eleni
In this paper, we propose a probabilistic physics-guided framework, termed Physics-guided Deep Markov Model (PgDMM). The framework is especially targeted to the inference of the characteristics and latent structure of nonlinear dynamical systems from measurement data, where it is typically intractable to perform exact inference of latent variables. A recently surfaced option pertains to leveraging variational inference to perform approximate inference. In such a scheme, transition and emission functions of the system are parameterized via feed-forward neural networks (deep generative models). However, due to the generalized and highly versatile formulation of neural network functions, the learned latent space is often prone to lack physical interpretation and structured representation. To address this, we bridge physics-based state space models with Deep Markov Models, thus delivering a hybrid modeling framework for unsupervised learning and identification for nonlinear dynamical systems. Specifically, the transition process can be modeled as a physics-based model enhanced with an additive neural network component, which aims to learn the discrepancy between the physics-based model and the actual dynamical system being monitored. The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system by imposing physics-driven restrictions on the side of the latent space. We demonstrate the benefits of such a fusion in terms of achieving improved performance on illustrative simulation examples and experimental case studies of nonlinear systems. Our results indicate that the physics-based models involved in the employed transition and emission functions essentially enforce a more structured and physically interpretable latent space, which is essential to generalization and prediction capabilities.
Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series
Quinton, Félix, Landrieu, Loic
The Common Agricultural Policy (CAP) is responsible for allocating agricultural subsidies in the European Union, which nears 50 billion euros each year [36]. Consequently, monitoring the crop types for subsidy allocation represents a significant challenge for payment agencies, which have encouraged the development of automated crop classification tools based on machine learning [24]. In particular, The Sentinels for Common Agricultural Policy (Sen4CAP) project [19] aims to provide EU member states with algorithmic solutions and best practice studies on crop monitoring based on satellite data from the Sentinel constellation [10]. Despite the inherent difficulty of differentiating between the complex growth patterns of plants, this task is made possible by the nearly limitless access to data and annotations. Indeed, Sentinel-2 offers multi-spectral observations at a high revisit time of five days on average, which are particularly appropriate for characterizing the complex spectral and temporal characteristics of crop phenology.