Goto

Collaborating Authors

 Undirected Networks


MarlRank: Multi-agent Reinforced Learning to Rank

arXiv.org Machine Learning

When estimating the relevancy between a query and a document, ranking models largely neglect the mutual information among documents. A common wisdom is that if two documents are similar in terms of the same query, they are more likely to have similar relevance score. To mitigate this problem, in this paper, we propose a multi-agent reinforced ranking model, named MarlRank. In particular, by considering each document as an agent, we formulate the ranking process as a multi-agent Markov Decision Process (MDP), where the mutual interactions among documents are incorporated in the ranking process. To compute the ranking list, each document predicts its relevance to a query considering not only its own query-document features but also its similar documents features and actions. By defining reward as a function of NDCG, we can optimize our model directly on the ranking performance measure. Our experimental results on two LETOR benchmark datasets show that our model has significant performance gains over the state-of-art baselines. We also find that the NDCG shows an overall increasing trend along with the step of interactions, which demonstrates that the mutual information among documents helps improve the ranking performance.


Genetic Algorithm - Explained Applications & Example

#artificialintelligence

What is a genetic algorithm? Bayesian inference ([1] links to particle methods in Bayesian statistics and hidden Markov chain models and [2] a tutorial on genetic particle models) Bioinformatics multiple sequence alignment.[1] SAGA is available on:.[4] Bioinformatics: Motif Discovery.[5] Calculation of bound states and local-density approximations. Code-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption.[8]


Reinforcement Learning for Portfolio Management

arXiv.org Machine Learning

T raditionally, mathematical formulations of dynamical systems in the context of Signal Processing and Control Theory have been a lynchpin of today's Financial Engineering. More recently, advances in sequential decision making, mainly through the concept of Reinforcement Learning, have been instrumental in the development of multistage stochastic optimization, a key component in sequential portfolio optimization (asset allocation) strategies. In this thesis, we develop a comprehensive account of the expressive power, modelling efficiency, and performance advantages of so called trading agents (i.e., Deep Soft Recurrent Q-Network (DSRQN) and Mixture of Score Machines (MSM)), based on both traditional system identification (model-based approach) as well as on context-independent agents (model-free approach). The analysis provides a conclusive support for the ability of model-free reinforcement learning methods to act as universal trading agents, which are not only capable of reducing the computational and memory complexity (owing to their linear scaling with size of the universe), but also serve as generalizing strategies across assets and markets, regardless of the trading universe on which they have been trained. The relatively low volume of daily returns in financial market data is addressed via data augmentation (a generative approach) and a choice of pre-training strategies, both of which are validated against current state-of-the-art models. For rigour, a risk-sensitive framework which includes transaction costs is considered, and its performance advantages are demonstrated in a variety of scenarios, from synthetic time-series (sinusoidal, sawtooth and chirp waves), ii simulated market series (surrogate data based), through to real market data (S&P 500 and EURO STOXX 50). The analysis and simulations confirm the superiority of universal model-free reinforcement learning agents over current portfolio management model in asset allocation strategies, with the achieved performance advantage of as much as 9.2% in annualized cumulative returns and 13.4% in annualized Sharpe Ratio.


Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes

arXiv.org Machine Learning

Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian, time-invariant, and ergodic structure in efficient OPE. We first derive the efficiency limits for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in ergodic time-invariant Markov decision processes, our bounds show that truly-off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on Double Reinforcement Learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and $q$-functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE.


Explicit-Duration Markov Switching Models

arXiv.org Machine Learning

Markov switching models (MSMs) are probabilistic models that employ multiple sets of parameters to describe different dynamic regimes that a time series may exhibit at different periods of time. The switching mechanism between regimes is controlled by unobserved random variables that form a first-order Markov chain. Explicit-duration MSMs contain additional variables that explicitly model the distribution of time spent in each regime. This allows to define duration distributions of any form, but also to impose complex dependence between the observations and to reset the dynamics to initial conditions. Models that focus on the first two properties are most commonly known as hidden semi-Markov models or segment models, whilst models that focus on the third property are most commonly known as changepoint models or reset models. In this monograph, we provide a description of explicit-duration modelling by categorizing the different approaches into three groups, which differ in encoding in the explicit-duration variables different information about regime change/reset boundaries. The approaches are described using the formalism of graphical models, which allows to graphically represent and assess statistical dependence and therefore to easily describe the structure of complex models and derive inference routines. The presentation is intended to be pedagogical, focusing on providing a characterization of the three groups in terms of model structure constraints and inference properties. The monograph is supplemented with a software package that contains most of the models and examples described. The material presented should be useful to both researchers wishing to learn about these models and researchers wishing to develop them further.


The Randomized Midpoint Method for Log-Concave Sampling

arXiv.org Machine Learning

Sampling from log-concave distributions is a well researched problem that has many applications in statistics and machine learning. We study the distributions of the form $p^{*}\propto\exp(-f(x))$, where $f:\mathbb{R}^{d}\rightarrow\mathbb{R}$ has an $L$-Lipschitz gradient and is $m$-strongly convex. In our paper, we propose a Markov chain Monte Carlo (MCMC) algorithm based on the underdamped Langevin diffusion (ULD). It can achieve $\epsilon\cdot D$ error (in 2-Wasserstein distance) in $\tilde{O}\left(\kappa^{7/6}/\epsilon^{1/3}+\kappa/\epsilon^{2/3}\right)$ steps, where $D\overset{\mathrm{def}}{=}\sqrt{\frac{d}{m}}$ is the effective diameter of the problem and $\kappa\overset{\mathrm{def}}{=}\frac{L}{m}$ is the condition number. Our algorithm performs significantly faster than the previously best known algorithm for solving this problem, which requires $\tilde{O}\left(\kappa^{1.5}/\epsilon\right)$ steps. Moreover, our algorithm can be easily parallelized to require only $O(\kappa\log\frac{1}{\epsilon})$ parallel steps. To solve the sampling problem, we propose a new framework to discretize stochastic differential equations. We apply this framework to discretize and simulate ULD, which converges to the target distribution $p^{*}$. The framework can be used to solve not only the log-concave sampling problem, but any problem that involves simulating (stochastic) differential equations.


Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

arXiv.org Artificial Intelligence

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc{bkmdp97} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).


On Memory Mechanism in Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) extends (single-agent) reinforcement learning (RL) by introducing additional agents and (potentially) partial observability of the environment. Consequently, algorithms for solving MARL problems incorporate various extensions beyond traditional RL methods, such as a learned communication protocol between cooperative agents that enables exchange of private information or adaptive modeling of opponents in competitive settings. One popular algorithmic construct is a memory mechanism such that an agent's decisions can depend not only upon the current state but also upon the history of observed states and actions. In this paper, we study how a memory mechanism can be useful in environments with different properties, such as observability, internality and presence of a communication channel. Using both prior work and new experiments, we show that a memory mechanism is helpful when learning agents need to model other agents and/or when communication is constrained in some way; however we must to be cautious of agents achieving effective memoryfulness through other means.


Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

arXiv.org Machine Learning

Reinforcement Learning (RL) has emerged as an efficient method of choice for solving complex sequential decision making problems in automatic control, computer science, economics, and biology. In this paper we present a model-free RL algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general case than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Buchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.


Byzantine-Robust Federated Machine Learning through Adaptive Model Averaging

arXiv.org Machine Learning

Federated learning enables training collaborative machine learning models at scale with many participants whilst preserving the privacy of their datasets. Standard federated learning techniques are vulnerable to Byzantine failures, biased local datasets, and poisoning attacks. In this paper we introduce Adaptive Federated Averaging, a novel algorithm for robust federated learning that is designed to detect failures, attacks, and bad updates provided by participants in a collaborative model. We propose a Hidden Markov Model to model and learn the quality of model updates provided by each participant during training. In contrast to existing robust federated learning schemes, we propose a robust aggregation rule that detects and discards bad or malicious local model updates at each training iteration. This includes a mechanism that blocks unwanted participants, which also increases the computational and communication efficiency. Our experimental evaluation on 4 real datasets show that our algorithm is significantly more robust to faulty, noisy and malicious participants, whilst being computationally more efficient than other state-of-the-art robust federated learning methods such as Multi-KRUM and coordinate-wise median .