Goto

Collaborating Authors

 Markov Models


Sequential Learning over Implicit Feedback for Robust Large-Scale Recommender Systems

arXiv.org Machine Learning

In this paper, we propose a robust sequential learning strategy for training large-scale Recommender Systems (RS) over implicit feedback mainly in the form of clicks. Our approach relies on the minimization of a pairwise ranking loss over blocks of consecutive items constituted by a sequence of non-clicked items followed by a clicked one for each user. Parameter updates are discarded if for a given user the number of sequential blocks is below or above some given thresholds estimated over the distribution of the number of blocks in the training set. This is to prevent from an abnormal number of clicks over some targeted items, mainly due to bots; or very few user interactions. Both scenarios affect the decision of RS and imply a shift over the distribution of items that are shown to the users. We provide a theoretical analysis showing that in the case where the ranking loss is convex, the deviation between the loss with respect to the sequence of weights found by the proposed algorithm and its minimum is bounded. Furthermore, experimental results on five large-scale collections demonstrate the efficiency of the proposed algorithm with respect to the state-of-the-art approaches, both regarding different ranking measures and computation time.


Towards the Next Generation Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking

arXiv.org Artificial Intelligence

Revenue management can enable airline corporations to maximize the revenue generated from each scheduled flight departing in their transportation network by means of finding the optimal policies for differential pricing, seat inventory control and overbooking. As different demand segments in the market have different Willingness-To-Pay (WTP), airlines use differential pricing, booking restrictions, and service amenities to determine different fare classes or products targeted at each of these demand segments. Because seats are limited for each flight, airlines also need to allocate seats for each of these fare classes to prevent lower fare class passengers from displacing higher fare class ones and set overbooking limits in anticipation of cancellations and no-shows such that revenue is maximized. Previous work addresses these problems using optimization techniques or classical Reinforcement Learning methods. This paper focuses on the latter problem - the seat inventory control problem - casting it as a Markov Decision Process to be able to find the optimal policy. Multiple fare classes, concurrent continuous arrival of passengers of different fare classes, overbooking and random cancellations that are independent of class have been considered in the model. We have addressed this problem using Deep Q-Learning with the goal of maximizing the reward for each flight departure. The implementation of this technique allows us to employ large continuous state space but also presents the potential opportunity to test on real time airline data. To generate data and train the agent, a basic air-travel market simulator was developed. The performance of the agent in different simulated market scenarios was compared against theoretically optimal solutions and was found to be nearly close to the expected optimal revenue.


A semi-supervised deep residual network for mode detection in Wi-Fi signals

arXiv.org Machine Learning

Due to their ubiquitous and pervasive nature, Wi-Fi networks have the potential to collect large-scale, low-cost, and disaggregate data on multimodal transportation. In this study, we develop a semi-supervised deep residual network (ResNet) framework to utilize Wi-Fi communications obtained from smartphones for the purpose of transportation mode detection. This framework is evaluated on data collected by Wi-Fi sensors located in a congested urban area in downtown Toronto. To tackle the intrinsic difficulties and costs associated with labelled data collection, we utilize ample amount of easily collected low-cost unlabelled data by implementing the semi-supervised part of the framework. By incorporating a ResNet architecture as the core of the framework, we take advantage of the high-level features not considered in the traditional machine learning frameworks. The proposed framework shows a promising performance on the collected data, with a prediction accuracy of 81.8% for walking, 82.5% for biking and 86.0% for the driving mode.


Shepherding Hordes of Markov Chains

arXiv.org Artificial Intelligence

This paper considers large families of Markov chains (MCs) that are defined over a set of parameters with finite discrete domains. Such families occur in software product lines, planning under partial observability, and sketching of probabilistic programs. Simple questions, like `does at least one family member satisfy a property?', are NP-hard. We tackle two problems: distinguish family members that satisfy a given quantitative property from those that do not, and determine a family member that satisfies the property optimally, i.e., with the highest probability or reward. We show that combining two well-known techniques, MDP model checking and abstraction refinement, mitigates the computational complexity. Experiments on a broad set of benchmarks show that in many situations, our approach is able to handle families of millions of MCs, providing superior scalability compared to existing solutions.


Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

arXiv.org Machine Learning

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this. We addressed the issue within the framework of partially observable Markov Decision Process (POMDP) using a model-based method, in which the transition model is estimated from the incomplete and noisy observations using a newly proposed surrogate loss function with local approximation, while the policy and value function is learned with the help of belief imputation. For the latter purpose, a generative model is constructed and is seamlessly incorporated into the belief updating procedure of POMDP, which enables robust execution even under a significant incompleteness and noise. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several compared methods under various challenging scenarios.


The Value Function Polytope in Reinforcement Learning

arXiv.org Machine Learning

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.


Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

arXiv.org Artificial Intelligence

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.


Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

arXiv.org Artificial Intelligence

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.


NAIL: A General Interactive Fiction Agent

arXiv.org Artificial Intelligence

Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL's performance.


Markov Chain-based Cost-Optimal Control Charts for Healthcare Data

arXiv.org Machine Learning

Control charts have traditionally been used in industrial statistics, but are constantly seeing new areas of application, especially in the age of Industry 4.0. This paper introduces a new method, which is suitable for applications in the healthcare sector, especially for monitoring a health-characteristic of a patient. We adapt a Markov chain-based approach and develop a method in which not only the shift size (i.e. the degradation of the patient's health) can be random, but the effect of the repair (i.e. treatment) and time between samplings (i.e. visits) too. This means that we do not use many often-present assumptions which are usually not applicable for medical treatments. The average cost of the protocol, which is determined by the time between samplings and the control limit, can be estimated using the stationary distribution of the Markov chain. Furthermore, we incorporate the standard deviation of the cost into the optimisation procedure, which is often very important from a process control viewpoint. The sensitivity of the optimal parameters and the resulting average cost and cost standard deviation on different parameter values is investigated. We demonstrate the usefulness of the approach for real-life data of patients treated in Hungary: namely the monitoring of cholesterol level of patients with cardiovascular event risk. The results showed that the optimal parameters from our approach can be somewhat different from the original medical parameters.