Goto

Collaborating Authors

 Markov Models


LACO: A Latency-Driven Network Slicing Orchestration in Beyond-5G Networks

arXiv.org Machine Learning

Network Slicing is expected to become a game changer in the upcoming 5G networks and beyond, enlarging the telecom business ecosystem through still-unexplored vertical industry profits. This implies that heterogeneous service level agreements (SLAs) must be guaranteed per slice given the multitude of predefined requirements. In this paper, we pioneer a novel radio slicing orchestration solution that simultaneously provides latency and throughput guarantees in a multi-tenancy environment. Leveraging on a solid mathematical framework, we exploit the exploration-vs-exploitation paradigm by means of a multi-armed-bandit-based (MAB) orchestrator, LACO, that makes adaptive resource slicing decisions with no prior knowledge on the traffic demand or channel quality statistics. As opposed to traditional MAB methods that are blind to the underlying system, LACO relies on system structure information to expedite decisions. After a preliminary simulations campaign empirically proving the validness of our solution, we provide a robust implementation of LACO using off-the-shelf equipment to fully emulate realistic network conditions: near-optimal results within affordable computational time are measured when LACO is in place. L. Zanzi, V. Sciancalepore, A. Garcia-Saavedra and X. Costa-Pérez are with NEC Laboratories Europe GmbH., 69115 Heidelberg, Germany. The quest for new sources of revenue that revitalizes the mobile industry has spawned an unprecedented hype around the fifth-generation of mobile networks (5G) and, in particular, the network slicing concept. A high-level view of the system considered in this paper is described in Figure 1. The figure represents a series of sliceable base stations as a pool of radio resources (coloured cubes in the figure). The resource allocation process is considered hierarchical: while bundles of radio resources are assigned to different tenants (namely radio slices), each tenant autonomously schedules its bundle of radio resources to each individual user following classic radio scheduling policies. The difference between such operations is subtle but of paramount importance: a slice controller operates at a larger timescale and thus over a coarser granularity [2], [3]. While most prior work on network slicing focuses on average bit-rate guarantees [3], [4], latency considerations have received little attention.


Unsupervised Machine Learning Hidden Markov Models in Python

#artificialintelligence

Created by Lazy Programmer Inc. English [Auto-generated], Portuguese [Auto-generated] Students also bought Data Science: Natural Language Processing (NLP) in Python Bayesian Machine Learning in Python: A/B Testing Data Science: Supervised Machine Learning in Python Ensemble Machine Learning in Python: Random Forest, AdaBoost The Complete Python Course Learn Python by Doing Preview this course GET COUPON CODE Description The Hidden Markov Model or HMM is all about learning sequences. A lot of the data that would be very useful for us to model is in sequences. Stock prices are sequences of prices. Language is a sequence of words. Credit scoring involves sequences of borrowing and repaying money, and we can use those sequences to predict whether or not you're going to default.


Real-time and Large-scale Fleet Allocation of Autonomous Taxis: A Case Study in New York Manhattan Island

arXiv.org Artificial Intelligence

Nowadays, autonomous taxis become a highly promising transportation mode, which helps relieve traffic congestion and avoid road accidents. However, it hinders the wide implementation of this service that traditional models fail to efficiently allocate the available fleet to deal with the imbalance of supply (autonomous taxis) and demand (trips), the poor cooperation of taxis, hardly satisfied resource constraints, and on-line platform's requirements. To figure out such urgent problems from a global and more farsighted view, we employ a Constrained Multi-agent Markov Decision Processes (CMMDP) to model fleet allocation decisions, which can be easily split into sub-problems formulated as a 'Dynamic assignment problem' combining both immediate rewards and future gains. We also leverage a Column Generation algorithm to guarantee the efficiency and optimality in a large scale. Through extensive experiments, the proposed approach not only achieves remarkable improvements over the state-of-the-art benchmarks in terms of the individual's efficiency (arriving at 12.40%, 6.54% rise of income and utilization, respectively) and the platform's profit (reaching 4.59% promotion) but also reveals a time-varying fleet adjustment policy to minimize the operation cost of the platform.


A Hybrid PAC Reinforcement Learning Algorithm

arXiv.org Machine Learning

This paper offers a new hybrid probably asymptotically correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of its parents. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results that support the claim regarding the new algorithm's sample efficiency compared to its parents are showcased in a small grid-world example.


Technical Report: The Policy Graph Improvement Algorithm

arXiv.org Artificial Intelligence

Optimizing a partially observable Markov decision process (POMDP) policy is challenging. The policy graph improvement (PGI) algorithm for POMDPs represents the policy as a fixed size policy graph and improves the policy monotonically. Due to the fixed policy size, computation time for each improvement iteration is known in advance. Moreover, the method allows for compact understandable policies. This report describes the technical details of the PGI [1] and particle based PGI [2] algorithms for POMDPs in a more accessible way than [1] or [2] allowing practitioners and students to understand and implement the algorithms.


Spatial Privacy Pricing: The Interplay between Privacy, Utility and Price in Geo-Marketplaces

arXiv.org Artificial Intelligence

A geo-marketplace allows users to be paid for their location data. Users concerned about privacy may want to charge more for data that pinpoints their location accurately, but may charge less for data that is more vague. A buyer would prefer to minimize data costs, but may have to spend more to get the necessary level of accuracy. We call this interplay between privacy, utility, and price \emph{spatial privacy pricing}. We formalize the issues mathematically with an example problem of a buyer deciding whether or not to open a restaurant by purchasing location data to determine if the potential number of customers is sufficient to open. The problem is expressed as a sequential decision making problem, where the buyer first makes a series of decisions about which data to buy and concludes with a decision about opening the restaurant or not. We present two algorithms to solve this problem, including experiments that show they perform better than baselines.


Learning to Infer User Hidden States for Online Sequential Advertising

arXiv.org Artificial Intelligence

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.


Change Point Detection by Cross-Entropy Maximization

arXiv.org Machine Learning

Many offline unsupervised change point detection algorithms rely on minimizing a penalized sum of segment-wise costs. We extend this framework by proposing to minimize a sum of discrepancies between segments. In particular, we propose to select the change points so as to maximize the cross-entropy between successive segments, balanced by a penalty for introducing new change points. We propose a dynamic programming algorithm to solve this problem and analyze its complexity. Experiments on two challenging datasets demonstrate the advantages of our method compared to three state-of-the-art approaches.


Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

arXiv.org Machine Learning

We investigate the sample efficiency of reinforcement learning in a $\gamma$-discounted infinite-horizon Markov decision process (MDP) with state space $\mathcal{S}$ and action space $\mathcal{A}$, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$ (up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an $\varepsilon$-optimal policy with an order of $\frac{|\mathcal{S}||\mathcal{A}| }{(1-\gamma)^3\varepsilon^2}\log\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)\varepsilon}$ samples for any $\varepsilon \in (0, \frac{1}{1-\gamma}]$. Along the way, we derive improved (instance-dependent) guarantees for model-based policy evaluation. To the best of our knowledge, this work provides the first minimax-optimal guarantee in a generative model that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically impossible).


Landscape of Machine Implemented Ethics

arXiv.org Artificial Intelligence

Abstract: This paper surveys the state-of-the-art in machine ethics, that is, considerations of how to implement ethical behaviour in robots, unmanned autonomous vehicles, or software systems. The emphasis is on covering the breadth of ethical theories being considered by implementors, as well as the implementation techniques being used. There is no consensus on which ethical theory is best suited for any particular domain, nor is there any agreement on which technique is best placed to implement a particular theory. Another unresolved problem in these implementations of ethical theories is how to objectively validate the implementations. The paper discusses the dilemmas being used as validating'whetstones' and whether any alternative validation mechanism exists. Finally, it speculates that an intermediate step of creating domain-specific ethics might be a possible stepping stone towards creating machines that exhibit ethical behaviour. Computers are increasingly a part of the socio-technical systems around us. Domains such as smartgrids, cloud computing, healthcare, and transport are but some examples where computers are deeply embedded. The speed and complexity of decision-making in these domains have meant that humans are ceding more and more autonomy to these computers (Nallur & Clarke 2018). Autonomy, in machines, can be defined as the effective decision-making power over goals, that influences some action in the real-world. For instance, smart traffic lights can autonomically change their timings, depending on the flow and density of traffic on the roads.