Goto

Collaborating Authors

 Undirected Networks


gym-DSSAT: a crop model turned into a Reinforcement Learning environment

arXiv.org Artificial Intelligence

Addressing a real world sequential decision problem with Reinforcement Learning (RL) usually starts with the use of a simulated environment that mimics real conditions. We present a novel open source RL environment for realistic crop management tasks. gym-DSSAT is a gym interface to the Decision Support System for Agrotechnology Transfer (DSSAT), a high fidelity crop simulator. DSSAT has been developped over the last 30 years and is widely recognized by agronomists. gym-DSSAT comes with predefined simulations based on real world maize experiments. The environment is as easy to use as any gym environment. We provide performance baselines using basic RL algorithms. We also briefly outline how the monolithic DSSAT simulator written in Fortran has been turned into a Python RL environment. Our methodology is generic and may be applied to similar simulators. We report on very preliminary experimental results which suggest that RL can help researchers to improve sustainability of fertilization and irrigation practices.


Collaborative Decision Making Using Action Suggestions

arXiv.org Artificial Intelligence

The level of autonomy is increasing in systems spanning multiple domains, but these systems still experience failures. One way to mitigate the risk of failures is to integrate human oversight of the autonomous systems and rely on the human to take control when the autonomy fails. In this work, we formulate a method of collaborative decision making through action suggestions that improves action selection without taking control of the system. Our approach uses each suggestion efficiently by incorporating the implicit information shared through suggestions to modify the agent's belief and achieves better performance with fewer suggestions than naively following the suggested actions. We assume collaborative agents share the same objective and communicate through valid actions. By assuming the suggested action is dependent only on the state, we can incorporate the suggested action as an independent observation of the environment. The assumption of a collaborative environment enables us to use the agent's policy to estimate the distribution over action suggestions. We propose two methods that use suggested actions and demonstrate the approach through simulated experiments. The proposed methodology results in increased performance while also being robust to suboptimal suggestions.


Regime-based Implied Stochastic Volatility Model for Crypto Option Pricing

arXiv.org Artificial Intelligence

The increasing adoption of Digital Assets (DAs), such as Bitcoin (BTC), rises the need for accurate option pricing models. Yet, existing methodologies fail to cope with the volatile nature of the emerging DAs. Many models have been proposed to address the unorthodox market dynamics and frequent disruptions in the microstructure caused by the non-stationarity, and peculiar statistics, in DA markets. However, they are either prone to the curse of dimensionality, as additional complexity is required to employ traditional theories, or they overfit historical patterns that may never repeat. Instead, we leverage recent advances in market regime (MR) clustering with the Implied Stochastic Volatility Model (ISVM). Time-regime clustering is a temporal clustering method, that clusters the historic evolution of a market into different volatility periods accounting for non-stationarity. ISVM can incorporate investor expectations in each of the sentiment-driven periods by using implied volatility (IV) data. In this paper, we applied this integrated time-regime clustering and ISVM method (termed MR-ISVM) to high-frequency data on BTC options at the popular trading platform Deribit. We demonstrate that MR-ISVM contributes to overcome the burden of complex adaption to jumps in higher order characteristics of option pricing models. This allows us to price the market based on the expectations of its participants in an adaptive fashion.


Graph clustering with Boltzmann machines

arXiv.org Machine Learning

Graph clustering is the process of grouping vertices into densely connected sets called clusters. We tailor two mathematical programming formulations from the literature, to this problem. In doing so, we obtain a heuristic approximation to the intra-cluster density maximization problem. We use two variations of a Boltzmann machine heuristic to obtain numerical solutions. For benchmarking purposes, we compare solution quality and computational performances to those obtained using a commercial solver, Gurobi. We also compare clustering quality to the clusters obtained using the popular Louvain modularity maximization method. Our initial results clearly demonstrate the superiority of our problem formulations. They also establish the superiority of the Boltzmann machine over the traditional exact solver. In the case of smaller less complex graphs, Boltzmann machines provide the same solutions as Gurobi, but with solution times that are orders of magnitude lower. In the case of larger and more complex graphs, Gurobi fails to return meaningful results within a reasonable time frame. Finally, we also note that both our clustering formulations, the distance minimization and $K$-medoids, yield clusters of superior quality to those obtained with the Louvain algorithm.


Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random steps

arXiv.org Artificial Intelligence

Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e^{-f(x)}$, given access to the gradient of $f$. A particular case of interest is that of a $d$-dimensional Gaussian distribution with covariance matrix $\Sigma$, in which case $f(x) = x^\top \Sigma^{-1} x$. We show that HMC can sample from a distribution that is $\varepsilon$-close in total variation distance using $\widetilde{O}(\sqrt{\kappa} d^{1/4} \log(1/\varepsilon))$ gradient queries, where $\kappa$ is the condition number of $\Sigma$. Our algorithm uses long and random integration times for the Hamiltonian dynamics. This contrasts with (and was motivated by) recent results that give an $\widetilde\Omega(\kappa d^{1/2})$ query lower bound for HMC with fixed integration times, even for the Gaussian case.


Multi-Agent Sequential Decision-Making via Communication

arXiv.org Artificial Intelligence

Communication helps agents to obtain information about others so that better coordinated behavior can be learned. Some existing work communicates predicted future trajectory with others, hoping to get clues about what others would do for better coordination. However, circular dependencies sometimes can occur when agents are treated synchronously so it is hard to coordinate decision-making. In this paper, we propose a novel communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, which is obtained by modeling the environment dynamics. In launching phase, the upper-level agents take the lead in making decisions and communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various multi-agent cooperative tasks.


Deep Reinforcement Learning for Adaptive Mesh Refinement

arXiv.org Artificial Intelligence

Finite element discretizations of problems in computational physics often rely on adaptive mesh refinement (AMR) to preferentially resolve regions containing important features during simulation. However, these spatial refinement strategies are often heuristic and rely on domain-specific knowledge or trial-and-error. We treat the process of adaptive mesh refinement as a local, sequential decision-making problem under incomplete information, formulating AMR as a partially observable Markov decision process. Using a deep reinforcement learning approach, we train policy networks for AMR strategy directly from numerical simulation. The training process does not require an exact solution or a high-fidelity ground truth to the partial differential equation at hand, nor does it require a pre-computed training dataset. The local nature of our reinforcement learning formulation allows the policy network to be trained inexpensively on much smaller problems than those on which they are deployed. The methodology is not specific to any particular partial differential equation, problem dimension, or numerical discretization, and can flexibly incorporate diverse problem physics. To that end, we apply the approach to a diverse set of partial differential equations, using a variety of high-order discontinuous Galerkin and hybridizable discontinuous Galerkin finite element discretizations. We show that the resultant deep reinforcement learning policies are competitive with common AMR heuristics, generalize well across problem classes, and strike a favorable balance between accuracy and cost such that they often lead to a higher accuracy per problem degree of freedom.


Reward Learning using Structural Motifs in Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

The Inverse Reinforcement Learning (\textit{IRL}) problem has seen rapid evolution in the past few years, with important applications in domains like robotics, cognition, and health. In this work, we explore the inefficacy of current IRL methods in learning an agent's reward function from expert trajectories depicting long-horizon, complex sequential tasks. We hypothesize that imbuing IRL models with structural motifs capturing underlying tasks can enable and enhance their performance. Subsequently, we propose a novel IRL method, SMIRL, that first learns the (approximate) structure of a task as a finite-state-automaton (FSA), then uses the structural motif to solve the IRL problem. We test our model on both discrete grid world and high-dimensional continuous domain environments. We empirically show that our proposed approach successfully learns all four complex tasks, where two foundational IRL baselines fail. Our model also outperforms the baselines in sample efficiency on a simpler toy task. We further show promising test results in a modified continuous domain on tasks with compositional reward functions.


Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem

arXiv.org Artificial Intelligence

Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.


Recurrent model-free RL can be a strong baseline for many POMDPs

AIHub

Our implementation of recurrent model-free RL outperforms the on-policy version (PPO/A2C-GRU), and a recent model-based POMDP algorithm (VRM) on most tasks of a POMDP benchmark where VRM was evaluated in their paper. While algorithms for decision-making typically focus on relatively easy problems where everything is known, most realistic problems involve noise and incomplete information. Complex algorithms have been proposed to tackle these complex problems, but there's a simple approach that (in theory) works on both the easy and the complex problems. We show how to make this simple approach work in practice. Decision-making tasks in the real world are messy, with noise, occlusions, and uncertainty that are typically missing from their canonical problem formulation as a Markov decision process (MDP; Bellman, 1957).